1
0
mirror of https://github.com/git/git.git synced 2024-10-20 20:28:13 +02:00
git/builtin
Jeff King 16950f8384 rev-list: add --disk-usage option for calculating disk usage
It can sometimes be useful to see which refs are contributing to the
overall repository size (e.g., does some branch have a bunch of objects
not found elsewhere in history, which indicates that deleting it would
shrink the size of a clone).

You can find that out by generating a list of objects, getting their
sizes from cat-file, and then summing them, like:

    git rev-list --objects --no-object-names main..branch
    git cat-file --batch-check='%(objectsize:disk)' |
    perl -lne '$total += $_; END { print $total }'

Though note that the caveats from git-cat-file(1) apply here. We "blame"
base objects more than their deltas, even though the relationship could
easily be flipped. Still, it can be a useful rough measure.

But one problem is that it's slow to run. Teaching rev-list to sum up
the sizes can be much faster for two reasons:

  1. It skips all of the piping of object names and sizes.

  2. If bitmaps are in use, for objects that are in the
     bitmapped packfile we can skip the oid_object_info()
     lookup entirely, and just ask the revindex for the
     on-disk size.

This patch implements a --disk-usage option which produces the same
answer in a fraction of the time. Here are some timings using a clone of
torvalds/linux:

  [rev-list piped to cat-file, no bitmaps]
  $ time git rev-list --objects --no-object-names --all |
    git cat-file --buffer --batch-check='%(objectsize:disk)' |
    perl -lne '$total += $_; END { print $total }'
  1459938510
  real	0m29.635s
  user	0m38.003s
  sys	0m1.093s

  [internal, no bitmaps]
  $ time git rev-list --disk-usage --objects --all
  1459938510
  real	0m31.262s
  user	0m30.885s
  sys	0m0.376s

Even though the wall-clock time is slightly worse due to parallelism,
notice the CPU savings between the two. We saved 21% of the CPU just by
avoiding the pipes.

But the real win is with bitmaps. If we use them without the new option:

  [rev-list piped to cat-file, bitmaps]
  $ time git rev-list --objects --no-object-names --all --use-bitmap-index |
    git cat-file --batch-check='%(objectsize:disk)' |
    perl -lne '$total += $_; END { print $total }'
  1459938510
  real	0m6.244s
  user	0m8.452s
  sys	0m0.311s

then we're faster to generate the list of objects, but we still spend a
lot of time piping and looking things up. But if we do both together:

  [internal, bitmaps]
  $ time git rev-list --disk-usage --objects --all --use-bitmap-index
  1459938510
  real	0m0.219s
  user	0m0.169s
  sys	0m0.049s

then we get the same answer much faster.

For "--all", that answer will correspond closely to "du objects/pack",
of course. But we're actually checking reachability here, so we're still
fast when we ask for more interesting things:

  $ time git rev-list --disk-usage --use-bitmap-index v5.0..v5.10
  374798628
  real	0m0.429s
  user	0m0.356s
  sys	0m0.072s

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-11 09:57:55 -08:00
..
add.c
am.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
annotate.c
apply.c
archive.c
bisect--helper.c
blame.c Merge branch 'ab/mailmap' 2021-01-25 14:19:19 -08:00
branch.c Merge branch 'ph/use-delete-refs' 2021-02-05 16:40:45 -08:00
bugreport.c
bundle.c
cat-file.c
check-attr.c
check-ignore.c
check-mailmap.c shortlog: remove unused(?) "repo-abbrev" feature 2021-01-12 14:04:42 -08:00
check-ref-format.c
checkout-index.c
checkout.c Merge branch 'dl/checkout-p-merge-base' 2020-12-23 13:59:46 -08:00
clean.c
clone.c
column.c
commit-graph.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
commit-tree.c
commit.c shortlog: remove unused(?) "repo-abbrev" feature 2021-01-12 14:04:42 -08:00
config.c
count-objects.c
credential-cache--daemon.c
credential-cache.c
credential-store.c
credential.c
describe.c refs: switch peel_ref() to peel_iterated_oid() 2021-01-21 15:51:31 -08:00
diff-files.c diff-merges: new function diff_merges_set_dense_combined_if_unset() 2020-12-21 13:47:31 -08:00
diff-index.c
diff-tree.c
diff.c Merge branch 'so/log-diff-merge' 2021-02-05 16:40:44 -08:00
difftool.c
env--helper.c
fast-export.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
fast-import.c
fetch-pack.c
fetch.c fetch: implement support for atomic reference updates 2021-01-12 12:06:15 -08:00
fmt-merge-msg.c
for-each-ref.c ref-filter: move ref_sorting flags to a bitfield 2021-01-07 15:13:21 -08:00
for-each-repo.c for-each-repo: do nothing on empty config 2021-01-07 19:12:02 -08:00
fsck.c fsck: make fsck_config() re-usable 2021-01-05 14:58:29 -08:00
gc.c Merge branch 'jk/peel-iterated-oid' 2021-02-03 15:04:49 -08:00
get-tar-commit-id.c
grep.c
hash-object.c
help.c
index-pack.c object-file.c: rename from sha1-file.c 2021-01-04 13:01:55 -08:00
init-db.c
interpret-trailers.c
log.c Merge branch 'so/log-diff-merge' 2021-02-05 16:40:44 -08:00
ls-files.c ls-files.c: add --deduplicate option 2021-01-23 11:48:20 -08:00
ls-remote.c
ls-tree.c
mailinfo.c
mailsplit.c
merge-base.c
merge-file.c
merge-index.c
merge-ours.c
merge-recursive.c
merge-tree.c
merge.c Merge branch 'so/log-diff-merge' 2021-02-05 16:40:44 -08:00
mktag.c mktag: add a --[no-]strict option 2021-01-06 14:22:24 -08:00
mktree.c
multi-pack-index.c
mv.c
name-rev.c hash-lookup: rename from sha1-lookup 2021-01-04 13:01:55 -08:00
notes.c
pack-objects.c Merge branch 'jv/pack-objects-narrower-ref-iteration' 2021-02-05 16:40:45 -08:00
pack-redundant.c Merge branch 'jc/deprecate-pack-redundant' 2021-01-25 14:19:18 -08:00
pack-refs.c
patch-id.c
prune-packed.c
prune.c
pull.c
push.c
range-diff.c
read-tree.c
rebase.c Merge branch 'rs/rebase-commit-validation' 2021-01-15 15:20:29 -08:00
receive-pack.c
reflog.c
remote-ext.c
remote-fd.c
remote.c
repack.c fetch-pack: refactor writing promisor file 2021-01-12 16:01:07 -08:00
replace.c
rerere.c
reset.c
rev-list.c rev-list: add --disk-usage option for calculating disk usage 2021-02-11 09:57:55 -08:00
rev-parse.c
revert.c
rm.c
send-pack.c
shortlog.c Merge branch 'ab/mailmap' 2021-01-25 14:19:19 -08:00
show-branch.c
show-index.c
show-ref.c refs: switch peel_ref() to peel_iterated_oid() 2021-01-21 15:51:31 -08:00
sparse-checkout.c
stash.c Merge branch 'en/stash-apply-sparse-checkout' 2021-01-15 15:20:29 -08:00
stripspace.c
submodule--helper.c builtin/*: update usage format 2021-01-06 15:10:49 -08:00
symbolic-ref.c
tag.c Merge branch 'ph/use-delete-refs' 2021-02-05 16:40:45 -08:00
unpack-file.c
unpack-objects.c
update-index.c
update-ref.c
update-server-info.c
upload-archive.c
upload-pack.c
var.c
verify-commit.c
verify-pack.c
verify-tag.c
worktree.c worktree: teach repair to fix multi-directional breakage 2020-12-21 13:44:28 -08:00
write-tree.c