mirror of
https://github.com/git/git.git
synced 2024-10-20 20:28:13 +02:00
16950f8384
It can sometimes be useful to see which refs are contributing to the overall repository size (e.g., does some branch have a bunch of objects not found elsewhere in history, which indicates that deleting it would shrink the size of a clone). You can find that out by generating a list of objects, getting their sizes from cat-file, and then summing them, like: git rev-list --objects --no-object-names main..branch git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' Though note that the caveats from git-cat-file(1) apply here. We "blame" base objects more than their deltas, even though the relationship could easily be flipped. Still, it can be a useful rough measure. But one problem is that it's slow to run. Teaching rev-list to sum up the sizes can be much faster for two reasons: 1. It skips all of the piping of object names and sizes. 2. If bitmaps are in use, for objects that are in the bitmapped packfile we can skip the oid_object_info() lookup entirely, and just ask the revindex for the on-disk size. This patch implements a --disk-usage option which produces the same answer in a fraction of the time. Here are some timings using a clone of torvalds/linux: [rev-list piped to cat-file, no bitmaps] $ time git rev-list --objects --no-object-names --all | git cat-file --buffer --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' 1459938510 real 0m29.635s user 0m38.003s sys 0m1.093s [internal, no bitmaps] $ time git rev-list --disk-usage --objects --all 1459938510 real 0m31.262s user 0m30.885s sys 0m0.376s Even though the wall-clock time is slightly worse due to parallelism, notice the CPU savings between the two. We saved 21% of the CPU just by avoiding the pipes. But the real win is with bitmaps. If we use them without the new option: [rev-list piped to cat-file, bitmaps] $ time git rev-list --objects --no-object-names --all --use-bitmap-index | git cat-file --batch-check='%(objectsize:disk)' | perl -lne '$total += $_; END { print $total }' 1459938510 real 0m6.244s user 0m8.452s sys 0m0.311s then we're faster to generate the list of objects, but we still spend a lot of time piping and looking things up. But if we do both together: [internal, bitmaps] $ time git rev-list --disk-usage --objects --all --use-bitmap-index 1459938510 real 0m0.219s user 0m0.169s sys 0m0.049s then we get the same answer much faster. For "--all", that answer will correspond closely to "du objects/pack", of course. But we're actually checking reachability here, so we're still fast when we ask for more interesting things: $ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10 374798628 real 0m0.429s user 0m0.356s sys 0m0.072s Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com> |
||
---|---|---|
.. | ||
add.c | ||
am.c | ||
annotate.c | ||
apply.c | ||
archive.c | ||
bisect--helper.c | ||
blame.c | ||
branch.c | ||
bugreport.c | ||
bundle.c | ||
cat-file.c | ||
check-attr.c | ||
check-ignore.c | ||
check-mailmap.c | ||
check-ref-format.c | ||
checkout-index.c | ||
checkout.c | ||
clean.c | ||
clone.c | ||
column.c | ||
commit-graph.c | ||
commit-tree.c | ||
commit.c | ||
config.c | ||
count-objects.c | ||
credential-cache--daemon.c | ||
credential-cache.c | ||
credential-store.c | ||
credential.c | ||
describe.c | ||
diff-files.c | ||
diff-index.c | ||
diff-tree.c | ||
diff.c | ||
difftool.c | ||
env--helper.c | ||
fast-export.c | ||
fast-import.c | ||
fetch-pack.c | ||
fetch.c | ||
fmt-merge-msg.c | ||
for-each-ref.c | ||
for-each-repo.c | ||
fsck.c | ||
gc.c | ||
get-tar-commit-id.c | ||
grep.c | ||
hash-object.c | ||
help.c | ||
index-pack.c | ||
init-db.c | ||
interpret-trailers.c | ||
log.c | ||
ls-files.c | ||
ls-remote.c | ||
ls-tree.c | ||
mailinfo.c | ||
mailsplit.c | ||
merge-base.c | ||
merge-file.c | ||
merge-index.c | ||
merge-ours.c | ||
merge-recursive.c | ||
merge-tree.c | ||
merge.c | ||
mktag.c | ||
mktree.c | ||
multi-pack-index.c | ||
mv.c | ||
name-rev.c | ||
notes.c | ||
pack-objects.c | ||
pack-redundant.c | ||
pack-refs.c | ||
patch-id.c | ||
prune-packed.c | ||
prune.c | ||
pull.c | ||
push.c | ||
range-diff.c | ||
read-tree.c | ||
rebase.c | ||
receive-pack.c | ||
reflog.c | ||
remote-ext.c | ||
remote-fd.c | ||
remote.c | ||
repack.c | ||
replace.c | ||
rerere.c | ||
reset.c | ||
rev-list.c | ||
rev-parse.c | ||
revert.c | ||
rm.c | ||
send-pack.c | ||
shortlog.c | ||
show-branch.c | ||
show-index.c | ||
show-ref.c | ||
sparse-checkout.c | ||
stash.c | ||
stripspace.c | ||
submodule--helper.c | ||
symbolic-ref.c | ||
tag.c | ||
unpack-file.c | ||
unpack-objects.c | ||
update-index.c | ||
update-ref.c | ||
update-server-info.c | ||
upload-archive.c | ||
upload-pack.c | ||
var.c | ||
verify-commit.c | ||
verify-pack.c | ||
verify-tag.c | ||
worktree.c | ||
write-tree.c |