1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-06-03 16:56:10 +02:00
git/Documentation/config
Taylor Blau dbcf611617 pack-revindex: introduce `pack.readReverseIndex`
Since 1615c567b8 (Documentation/config/pack.txt: advertise
'pack.writeReverseIndex', 2021-01-25), we have had the
`pack.writeReverseIndex` configuration option, which tells Git whether
or not it is allowed to write a ".rev" file when indexing a pack.

Introduce a complementary configuration knob, `pack.readReverseIndex` to
control whether or not Git will read any ".rev" file(s) that may be
available on disk.

This option is useful for debugging, as well as disabling the effect of
".rev" files in certain instances.

This is useful because of the trade-off[^1] between the time it takes to
generate a reverse index (slow from scratch, fast when reading an
existing ".rev" file), and the time it takes to access a record (the
opposite).

For example, even though it is faster to use the on-disk reverse index
when computing the on-disk size of a packed object, it is slower to
enumerate the same value for all objects.

Here are a couple of examples from linux.git. When computing the above
for a single object, using the on-disk reverse index is significantly
faster:

    $ git rev-parse HEAD >in
    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" <in'
    Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in
      Time (mean ± σ):     302.5 ms ±  12.5 ms    [User: 258.7 ms, System: 43.6 ms]
      Range (min … max):   291.1 ms … 328.1 ms    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in
      Time (mean ± σ):       3.9 ms ±   0.3 ms    [User: 1.6 ms, System: 2.4 ms]
      Range (min … max):     2.0 ms …   4.4 ms    801 runs

    Summary
      'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" <in' ran
       77.29 ± 7.14 times faster than 'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" <in'

, but when instead trying to compute the on-disk object size for all
objects in the repository, using the ".rev" file is a disadvantage over
creating the reverse index from scratch:

    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'
    Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects
      Time (mean ± σ):      8.258 s ±  0.035 s    [User: 7.949 s, System: 0.308 s]
      Range (min … max):    8.199 s …  8.293 s    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects
      Time (mean ± σ):     16.976 s ±  0.107 s    [User: 16.706 s, System: 0.268 s]
      Range (min … max):   16.839 s … 17.105 s    10 runs

    Summary
      'git.compile -c pack.readReverseIndex=false cat-file --batch-check="%(objectsize:disk)" --batch-all-objects' ran
	2.06 ± 0.02 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --batch-check="%(objectsize:disk)" --batch-all-objects'

Luckily, the results when running `git cat-file` with `--unordered` are
closer together:

    $ hyperfine -L v false,true 'git.compile -c pack.readReverseIndex={v} cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'
    Benchmark 1: git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
      Time (mean ± σ):      5.066 s ±  0.105 s    [User: 4.792 s, System: 0.274 s]
      Range (min … max):    4.943 s …  5.220 s    10 runs

    Benchmark 2: git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects
      Time (mean ± σ):      6.193 s ±  0.069 s    [User: 5.937 s, System: 0.255 s]
      Range (min … max):    6.145 s …  6.356 s    10 runs

    Summary
      'git.compile -c pack.readReverseIndex=false cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects' ran
        1.22 ± 0.03 times faster than 'git.compile -c pack.readReverseIndex=true cat-file --unordered --batch-check="%(objectsize:disk)" --batch-all-objects'

Because the equilibrium point between these two is highly machine- and
repository-dependent, allow users to configure whether or not they will
read any ".rev" file(s) with this configuration knob.

[^1]: Generating a reverse index in memory takes O(N) time (where N is
  the number of objects in the repository), since we use a radix sort.
  Reading an entry from an on-disk ".rev" file is slower since each
  operation is bound by disk I/O instead of memory I/O.

  In order to compute the on-disk size of a packed object, we need to
  find the offset of our object, and the adjacent object (the on-disk
  size difference of these two). Finding the first offset requires a
  binary search. Finding the latter involves a single .rev lookup.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13 07:55:46 -07:00
..
add.txt add: remove "add.interactive.useBuiltin" & Perl "git add--interactive" 2023-02-06 15:03:34 -08:00
advice.txt advice: add diverging advice for novices 2023-03-08 09:28:42 -08:00
alias.txt config/alias.txt: document alias accepting non-command first word 2019-06-06 09:33:42 -07:00
am.txt
apply.txt
blame.txt blame: correct name of config option in docs 2021-06-28 10:05:13 -07:00
branch.txt push: default to single remote even when not named origin 2022-04-29 11:20:55 -07:00
browser.txt
bundle.txt bundle-uri: parse bundle.heuristic=creationToken 2023-01-31 08:57:48 -08:00
checkout.txt parallel-checkout: add configuration options 2021-04-19 11:57:05 -07:00
clean.txt
clone.txt clone, submodule: pass partial clone filters to submodules 2022-02-09 15:38:36 -08:00
color.txt Merge branch 'hm/paint-hits-in-log-grep' 2021-11-01 13:48:08 -07:00
column.txt
commit.txt
commitgraph.txt commit-graph: use config to specify generation type 2021-02-25 15:10:41 -08:00
completion.txt
core.txt doc: use "commit-graph" hyphenation consistently 2022-10-30 19:58:40 -04:00
credential.txt crendential-store: use timeout when locking file 2020-11-25 12:30:18 -08:00
diff.txt difftool docs: de-duplicate configuration sections 2022-09-07 09:46:06 -07:00
difftool.txt difftool docs: de-duplicate configuration sections 2022-09-07 09:46:06 -07:00
extensions.txt Documentation: add extensions.worktreeConfig details 2022-02-08 09:49:20 -08:00
fastimport.txt
feature.txt features: feature.manyFiles implies fast index writes 2023-01-07 07:46:14 +09:00
fetch.txt bundle-uri: store fetch.bundleCreationToken 2023-01-31 08:57:48 -08:00
filter.txt
fmt-merge-msg.txt config/fmt-merge-msg.txt: drop space in quote 2020-09-27 14:22:41 -07:00
format.txt format-patch: add format.noprefix option 2023-03-09 08:37:27 -08:00
fsck.txt fsck: document msg-id 2022-10-25 15:44:18 -07:00
fsmonitor--daemon.txt fsmonitor: add documentation for allowRemote and socketDir options 2022-10-05 11:05:23 -07:00
gc.txt builtin/gc.c: conditionally avoid pruning objects via loose 2022-05-26 15:48:26 -07:00
gitcvs.txt
gitweb.txt
gpg.txt signature-format.txt: note SSH and X.509 signature delimiters 2023-02-27 13:42:43 -08:00
grep.txt grep docs: de-duplicate configuration sections 2022-09-07 09:46:05 -07:00
gui.txt docs: use "character encoding" to refer to commit-object encoding 2021-08-27 12:45:45 -07:00
guitool.txt
help.txt help.c: help.autocorrect=prompt waits for user action 2021-08-14 11:20:49 -07:00
http.txt i18n: fix mismatched camelCase config variables 2022-06-17 10:38:26 -07:00
i18n.txt
imap.txt
includeif.txt config.txt: document include, includeIf 2022-07-17 14:23:42 -07:00
index.txt read-cache: add index.skipHash config option 2023-01-07 07:46:14 +09:00
init.txt clone: respect remote unborn HEAD 2021-02-05 13:49:55 -08:00
instaweb.txt
interactive.txt checkout: split part of it to new command 'restore' 2019-05-07 13:04:47 +09:00
log.txt diff-merges: clarify log.diffMerges documentation 2022-09-16 09:21:44 -07:00
lsrefs.txt docs: move protocol-related docs to man section 5 2022-08-04 14:12:23 -07:00
mailinfo.txt
mailmap.txt
maintenance.txt maintenance: incremental strategy runs pack-refs weekly 2021-02-09 23:09:29 -08:00
man.txt
merge.txt update documentation for new zdiff3 conflictStyle 2021-12-01 14:45:59 -08:00
mergetool.txt Merge branch 'nb/doc-mergetool-typofix' into maint-2.38 2022-10-25 17:11:38 -07:00
notes.txt notes docs: de-duplicate and combine configuration sections 2022-09-07 09:46:06 -07:00
pack.txt pack-revindex: introduce `pack.readReverseIndex` 2023-04-13 07:55:46 -07:00
pager.txt
pretty.txt
protocol.txt Sync with 2.37.4 2022-10-06 20:00:04 -04:00
pull.txt pull: remove support for `--rebase=preserve` 2021-09-07 21:45:32 -07:00
push.txt Doc: document push.recurseSubmodules=only 2022-11-14 16:55:50 -05:00
rebase.txt rebase: add a config option for --rebase-merges 2023-03-27 09:32:49 -07:00
receive.txt receive-pack: new config receive.procReceiveRefs 2020-08-27 12:47:47 -07:00
remote.txt docs: mention --refetch fetch option 2022-03-28 10:25:53 -07:00
remotes.txt
repack.txt builtin/repack.c: allow configuring cruft pack generation 2022-05-26 15:48:26 -07:00
rerere.txt
revert.txt revert: config documentation fixes 2022-06-27 08:37:36 -07:00
safe.txt setup.c: create `safe.bareRepository` 2022-07-14 15:08:29 -07:00
sendemail.txt send-email docs: de-duplicate configuration sections 2022-09-07 09:46:05 -07:00
sequencer.txt
showbranch.txt
sparse.txt repo_read_index: add config to expect files outside sparse patterns 2022-03-01 23:37:48 -08:00
splitindex.txt
ssh.txt
stash.txt stash: remove documentation for `stash.useBuiltin` 2022-01-27 18:00:37 -08:00
status.txt status: add status.aheadbehind setting 2019-06-21 09:35:00 -07:00
submodule.txt branch: add --recurse-submodules option for branch creation 2022-02-04 08:16:39 -08:00
tag.txt separate tar.* config to its own source file 2020-03-18 12:42:09 -07:00
tar.txt separate tar.* config to its own source file 2020-03-18 12:42:09 -07:00
trace2.txt doc: fix some typos 2021-01-04 11:27:48 -08:00
transfer.txt bundle-uri client: add boolean transfer.bundleURI setting 2022-12-25 16:24:23 +09:00
uploadarchive.txt
uploadpack.txt Documentation: define protected configuration 2022-07-14 15:08:29 -07:00
url.txt
user.txt ssh signing: support non ssh-* keytypes 2021-11-19 09:05:25 -08:00
versionsort.txt
web.txt
worktree.txt