1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-25 04:16:09 +02:00

fsck: free tree buffers after walking unreachable objects

After calling fsck_walk(), a tree object struct may be left in the
parsed state, with the full tree contents available via tree->buffer.
It's the responsibility of the caller to free these when it's done with
the object to avoid having many trees allocated at once.

In a regular "git fsck", we hit fsck_walk() only from fsck_obj(), which
does call free_tree_buffer(). Likewise for "--connectivity-only", we see
most objects via traverse_one_object(), which makes a similar call.

The exception is in mark_unreachable_referents(). When using both
"--connectivity-only" and "--dangling" (the latter of which is the
default), we walk all of the unreachable objects, and there we forget to
free. Most cases would not notice this, because they don't have a lot of
unreachable objects, but you can make a pathological case like this:

  git clone --bare /path/to/linux.git repo.git
  cd repo.git
  rm packed-refs ;# now everything is unreachable!
  git fsck --connectivity-only

That ends up with peak heap usage ~18GB, which is (not coincidentally)
close to the size of all uncompressed trees in the repository. After
this patch, the peak heap is only ~2GB.

A few things to note:

  - it might seem like fsck_walk(), if it is parsing the trees, should
    be responsible for freeing them. But the situation is quite tricky.
    In the non-connectivity mode, after we call fsck_walk() we then
    proceed with fsck_object() which actually does the type-specific
    sanity checks on the object contents. We do pass our own separate
    buffer to fsck_object(), but there's a catch: our earlier call to
    parse_object_buffer() may have attached that buffer to the object
    struct! So by freeing it, we leave the rest of the code with a
    dangling pointer.

    Likewise, the call to fsck_walk() in index-pack is subtle. It
    attaches a buffer to the tree object that must not be freed! And
    so rather than calling free_tree_buffer(), it actually detaches it
    by setting tree->buffer to NULL.

    These cases would _probably_ be fixable by having fsck_walk() free
    the tree buffer only when it was the one who allocated it via
    parse_tree(). But that would still leave the callers responsible for
    freeing other cases, so they wouldn't be simplified. While the
    current semantics for fsck_walk() make it easy to accidentally leak
    in new callers, at least they are simple to explain, and it's not a
    function that's likely to get a lot of new call-sites.

    And in any case, it's probably sensible to fix the leak first with
    this simple patch, and try any more complicated refactoring
    separately.

  - a careful reader may notice that fsck_obj() also frees commit
    buffers, but neither the call in traverse_one_object() nor the one
    touched in this patch does so. And indeed, this is another problem
    for --connectivity-only (and accounts for most of the 2GB heap after
    this patch), but it's one we'll fix in a separate commit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Jeff King 2022-09-22 06:11:43 -04:00 committed by Junio C Hamano
parent a0feb8611d
commit fbce4fa9ae

View File

@ -228,6 +228,8 @@ static void mark_unreachable_referents(const struct object_id *oid)
options.walk = mark_used;
fsck_walk(obj, NULL, &options);
if (obj->type == OBJ_TREE)
free_tree_buffer((struct tree *)obj);
}
static int mark_loose_unreachable_referents(const struct object_id *oid,