1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-24 03:46:12 +02:00
git/t/perf/p5600-partial-clone.sh
Jeff King c1fa951d7e revision: avoid parsing with --exclude-promisor-objects
When --exclude-promisor-objects is given, before traversing any objects
we iterate over all of the objects in any promisor packs, marking them
as UNINTERESTING and SEEN. We turn the oid we get from iterating the
pack into an object with parse_object(), but this has two problems:

  - it's slow; we are zlib inflating (and reconstructing from deltas)
    every byte of every object in the packfile

  - it leaves the tree buffers attached to their structs, which means
    our heap usage will grow to store every uncompressed tree
    simultaneously. This can be gigabytes.

We can obviously fix the second by freeing the tree buffers after we've
parsed them. But we can observe that the function doesn't look at the
object contents at all! The only reason we call parse_object() is that
we need a "struct object" on which to set the flags. There are two
options here:

  - we can look up just the object type via oid_object_info(), and then
    call the appropriate lookup_foo() function

  - we can call lookup_unknown_object(), which gives us an OBJ_NONE
    struct (which will get auto-converted later by object_as_type() via
    calls to lookup_commit(), etc).

The first one is closer to the current code, but we do pay the price to
look up the type for each object. The latter should be more efficient in
CPU, though it wastes a little bit of memory (the "unknown" object
structs are a union of all object types, so some of the structs are
bigger than they need to be). It also runs the risk of triggering a
latent bug in code that calls lookup_object() directly but isn't ready
to handle OBJ_NONE (such code would already be buggy, but we use
lookup_unknown_object() infrequently enough that it might be hiding).

I went with the second option here. I don't think the risk is high (and
we'd want to find and fix any such bugs anyway), and it should be more
efficient overall.

The new tests in p5600 show off the improvement (this is on git.git):

  Test                                 HEAD^               HEAD
  -------------------------------------------------------------------------------
  5600.5: count commits                0.37(0.37+0.00)     0.38(0.38+0.00) +2.7%
  5600.6: count non-promisor commits   11.74(11.37+0.37)   0.04(0.03+0.00) -99.7%

The improvement is particularly big in this script because _every_
object in the newly-cloned partial repo is a promisor object. So after
marking them all, there's nothing left to traverse.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-13 13:22:37 -07:00

39 lines
808 B
Bash
Executable File

#!/bin/sh
test_description='performance of partial clones'
. ./perf-lib.sh
test_perf_default_repo
test_expect_success 'enable server-side config' '
git config uploadpack.allowFilter true &&
git config uploadpack.allowAnySHA1InWant true
'
test_perf 'clone without blobs' '
rm -rf bare.git &&
git clone --no-local --bare --filter=blob:none . bare.git
'
test_perf 'checkout of result' '
rm -rf worktree &&
mkdir -p worktree/.git &&
tar -C bare.git -cf - . | tar -C worktree/.git -xf - &&
git -C worktree config core.bare false &&
git -C worktree checkout -f
'
test_perf 'fsck' '
git -C bare.git fsck
'
test_perf 'count commits' '
git -C bare.git rev-list --all --count
'
test_perf 'count non-promisor commits' '
git -C bare.git rev-list --all --count --exclude-promisor-objects
'
test_done