1
0
mirror of https://github.com/git/git.git synced 2024-09-28 06:30:37 +02:00
Commit Graph

72485 Commits

Author SHA1 Message Date
Brian Lyles
c282eba2d5 rebase: update --empty=ask to --empty=stop
When git-am(1) got its own `--empty` option in 7c096b8d61 (am: support
--empty=<option> to handle empty patches, 2021-12-09), `stop` was used
instead of `ask`. `stop` is a more accurate term for describing what
really happens, and consistency is good.

Update git-rebase(1) to also use `stop`, while keeping `ask` as a
deprecated synonym. Update the tests to primarily use `stop`, but also
ensure that `ask` is still allowed.

In a future commit, we'll be adding a new `--empty` option for
git-cherry-pick(1) as well, making the consistency even more relevant.

Reported-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Brian Lyles <brianmlyles@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 16:45:40 -07:00
Brian Lyles
64a443efe4 docs: clean up --empty formatting in git-rebase(1) and git-am(1)
Both of these pages document very similar `--empty` options, but with
different styles. The exact behavior of these `--empty` options differs
somewhat, but consistent styling in the docs is still beneficial. This
commit aims to make them more consistent.

Break the possible values for `--empty` into separate sections for
readability. Alphabetical order is chosen for consistency.

In a future commit, we'll be documenting a new `--empty` option for
git-cherry-pick(1), making the consistency even more relevant.

Signed-off-by: Brian Lyles <brianmlyles@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 16:45:40 -07:00
Brian Lyles
0af38890ad docs: address inaccurate --empty default with --exec
The documentation for git-rebase(1) indicates that using the `--exec`
option will use `--empty=drop`. This is inaccurate: when `--interactive`
is not explicitly provided, `--exec` results in `--empty=keep`
behaviors.

Correctly indicate the behavior of `--exec` using `--empty=keep` when
`--interactive` is not specified.

Reported-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Signed-off-by: Brian Lyles <brianmlyles@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-25 16:45:40 -07:00
Junio C Hamano
e09f1254c5 The fifth batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-07 15:59:42 -08:00
Junio C Hamano
a82fa7bce8 Merge branch 'jk/upload-pack-v2-capability-cleanup'
The upload-pack program, when talking over v2, accepted the
packfile-uris protocol extension from the client, even if it did
not advertise the capability, which has been corrected.

* jk/upload-pack-v2-capability-cleanup:
  upload-pack: only accept packfile-uris if we advertised it
  upload-pack: use existing config mechanism for advertisement
  upload-pack: centralize setup of sideband-all config
  upload-pack: use repository struct to get config
2024-03-07 15:59:42 -08:00
Junio C Hamano
56d6084560 Merge branch 'jk/upload-pack-bounded-resources'
Various parts of upload-pack has been updated to bound the resource
consumption relative to the size of the repository to protect from
abusive clients.

* jk/upload-pack-bounded-resources:
  upload-pack: free tree buffers after parsing
  upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places
  upload-pack: always turn off save_commit_buffer
  upload-pack: disallow object-info capability by default
  upload-pack: accept only a single packfile-uri line
  upload-pack: use a strmap for want-ref lines
  upload-pack: use oidset for deepen_not list
  upload-pack: switch deepen-not list to an oid_array
  upload-pack: drop separate v2 "haves" array
2024-03-07 15:59:42 -08:00
Junio C Hamano
963a277a52 Merge branch 'ps/reftable-repo-init-fix'
Clear the fallout from a fix for 2.44 regression.

* ps/reftable-repo-init-fix:
  t0610: remove unused variable assignment
  refs/reftable: don't fail empty transactions in repo without HEAD
2024-03-07 15:59:42 -08:00
Junio C Hamano
ce65a188b1 Merge branch 'ps/remote-helper-repo-initialization-fix'
A custom remote helper no longer cannot access the newly created
repository during "git clone", which is a regression in Git 2.44.
This has been corrected.

* ps/remote-helper-repo-initialization-fix:
  builtin/clone: allow remote helpers to detect repo
2024-03-07 15:59:42 -08:00
Junio C Hamano
6a887bdd92 Merge branch 'ml/log-merge-with-cherry-pick-and-other-pseudo-heads'
"git log --merge" learned to pay attention to CHERRY_PICK_HEAD and
other kinds of *_HEAD pseudorefs.

* ml/log-merge-with-cherry-pick-and-other-pseudo-heads:
  revision: implement `git log --merge` also for rebase/cherry-pick/revert
  revision: ensure MERGE_HEAD is a ref in prepare_show_merge
2024-03-07 15:59:41 -08:00
Junio C Hamano
f46a3f143e Merge branch 'eg/add-uflags'
Code clean-up practice.

* eg/add-uflags:
  add: use unsigned type for collection of bits
2024-03-07 15:59:41 -08:00
Junio C Hamano
798ddfc17f Merge branch 'jt/commit-redundant-scissors-fix'
"git commit -v --cleanup=scissors" used to add the scissors line
twice in the log message buffer, which has been corrected.

* jt/commit-redundant-scissors-fix:
  commit: unify logic to avoid multiple scissors lines when merging
  commit: avoid redundant scissor line with --cleanup=scissors -v
2024-03-07 15:59:41 -08:00
Junio C Hamano
ae46d5fb98 Merge branch 'js/merge-tree-3-trees'
"git merge-tree" has learned that the three trees involved in the
3-way merge only need to be trees, not necessarily commits.

* js/merge-tree-3-trees:
  fill_tree_descriptor(): mark error message for translation
  cache-tree: avoid an unnecessary check
  Always check `parse_tree*()`'s return value
  t4301: verify that merge-tree fails on missing blob objects
  merge-ort: do check `parse_tree()`'s return value
  merge-tree: fail with a non-zero exit code on missing tree objects
  merge-tree: accept 3 trees as arguments
2024-03-07 15:59:41 -08:00
Junio C Hamano
76d1cd8e5e Merge branch 'cc/rev-list-allow-missing-tips'
"git rev-list --missing=print" has learned to optionally take
"--allow-missing-tips", which allows the objects at the starting
points to be missing.

* cc/rev-list-allow-missing-tips:
  revision: fix --missing=[print|allow*] for annotated tags
  rev-list: allow missing tips with --missing=[print|allow*]
  t6022: fix 'test' style and 'even though' typo
  oidset: refactor oidset_insert_from_set()
  revision: clarify a 'return NULL' in get_reference()
2024-03-07 15:59:40 -08:00
Junio C Hamano
2c206fc82a Merge branch 'jc/no-lazy-fetch'
"git --no-lazy-fetch cmd" allows to run "cmd" while disabling lazy
fetching of objects from the promisor remote, which may be handy
for debugging.

* jc/no-lazy-fetch:
  git: extend --no-lazy-fetch to work across subprocesses
  git: document GIT_NO_REPLACE_OBJECTS environment variable
  git: --no-lazy-fetch option
2024-03-07 15:59:40 -08:00
Patrick Steinhardt
e0795e2c79 t0610: remove unused variable assignment
In b0f6b6b523 (refs/reftable: don't fail empty transactions in repo
without HEAD, 2024-02-27), we have added a new test to t0610. This test
contains a useless assignment to a variable that is never actually used.
Remove it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-06 08:40:40 -08:00
Junio C Hamano
43072b4ca1 The fourth batch
Also update the DEF_VER in GIT-VERSION-GEN, which I forgot to do
earlier (it should have been done when we started the new cycle).

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-05 09:44:44 -08:00
Junio C Hamano
53ac1f106f Merge branch 'ak/rebase-autosquash'
Typofix.

* ak/rebase-autosquash:
  rebase: fix typo in autosquash documentation
2024-03-05 09:44:44 -08:00
Junio C Hamano
d037212d97 Merge branch 'kn/for-all-refs'
"git for-each-ref" learned "--include-root-refs" option to show
even the stuff outside the 'refs/' hierarchy.

* kn/for-all-refs:
  for-each-ref: add new option to include root refs
  ref-filter: rename 'FILTER_REFS_ALL' to 'FILTER_REFS_REGULAR'
  refs: introduce `refs_for_each_include_root_refs()`
  refs: extract out `loose_fill_ref_dir_regular_file()`
  refs: introduce `is_pseudoref()` and `is_headref()`
2024-03-05 09:44:44 -08:00
Junio C Hamano
661f379791 Merge branch 'pb/ort-make-submodule-conflict-message-an-advice'
When a merge conflicted at a submodule, merge-ort backend used to
unconditionally give a lengthy message to suggest how to resolve
it.  Now the message can be squelched as an advice message.

* pb/ort-make-submodule-conflict-message-an-advice:
  merge-ort: turn submodule conflict suggestions into an advice
2024-03-05 09:44:43 -08:00
Junio C Hamano
53929db7c4 Merge branch 'jc/doc-compat-util'
Clarify wording in the CodingGuidelines that requires <git-compat-util.h>
to be the first header file.

* jc/doc-compat-util:
  doc: clarify the wording on <git-compat-util.h> requirement
2024-03-05 09:44:43 -08:00
Junio C Hamano
e58a4de3bb Merge branch 'sg/upload-pack-error-message-fix'
An error message from "git upload-pack", which responds to "git
fetch" requests, had a trialing NUL in it, which has been
corrected.

* sg/upload-pack-error-message-fix:
  upload-pack: don't send null character in abort message to the client
2024-03-05 09:44:43 -08:00
Junio C Hamano
d31a515e9c Merge branch 'rs/submodule-prefix-simplify'
Code simplification.

* rs/submodule-prefix-simplify:
  submodule: use strvec_pushf() for --submodule-prefix
2024-03-05 09:44:43 -08:00
Junio C Hamano
b5111647cb Merge branch 'rs/name-rev-with-mempool'
Many small allocations "git name-rev" makes have been updated to
allocate from a mem-pool.

* rs/name-rev-with-mempool:
  name-rev: use mem_pool_strfmt()
  mem-pool: add mem_pool_strfmt()
2024-03-05 09:44:43 -08:00
Junio C Hamano
6f74483667 Merge branch 'rs/fetch-simplify-with-starts-with'
Code simplification.

* rs/fetch-simplify-with-starts-with:
  fetch: convert strncmp() with strlen() to starts_with()
2024-03-05 09:44:42 -08:00
Junio C Hamano
74522bbd98 Merge branch 'jk/reflog-special-cases-fix'
The logic to access reflog entries by date and number had ugly
corner cases at the boundaries, which have been cleaned up.

* jk/reflog-special-cases-fix:
  read_ref_at(): special-case ref@{0} for an empty reflog
  get_oid_basic(): special-case ref@{n} for oldest reflog entry
  Revert "refs: allow @{n} to work with n-sized reflog"
2024-03-05 09:44:42 -08:00
Junio C Hamano
542d093b1d Merge branch 'jc/no-include-of-compat-util-from-headers'
Header file clean-up.

* jc/no-include-of-compat-util-from-headers:
  compat: drop inclusion of <git-compat-util.h>
2024-03-05 09:44:42 -08:00
Junio C Hamano
d619abf7fa Merge branch 'js/remove-cruft-files'
Remove an empty file that shouldn't have been added in the first
place.

* js/remove-cruft-files:
  neue: remove a bogus empty file
2024-03-05 09:44:42 -08:00
Junio C Hamano
6249de53a3 Merge branch 'jk/textconv-cache-outside-repo-fix'
The code incorrectly attempted to use textconv cache when asked,
even when we are not running in a repository, which has been
corrected.

* jk/textconv-cache-outside-repo-fix:
  userdiff: skip textconv caching when not in a repository
2024-03-05 09:44:42 -08:00
Junio C Hamano
b387623c12 The third batch
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-01 14:38:56 -08:00
Junio C Hamano
421d5a7574 Merge branch 'tb/multi-pack-verbatim-reuse' into HEAD
Docfix.

* tb/multi-pack-verbatim-reuse:
  Documentation/config/pack.txt: fix broken AsciiDoc mark-up
2024-03-01 14:38:56 -08:00
Junio C Hamano
2b5738c867 Merge branch 'hs/rebase-not-in-progress' into HEAD
Error message update.

* hs/rebase-not-in-progress:
  rebase: make warning less passive aggressive
2024-03-01 14:38:56 -08:00
Junio C Hamano
8e69efba8f Merge branch 'jw/remote-doc-typofix' into HEAD
Docfix.

* jw/remote-doc-typofix:
  git-remote.txt: fix typo
2024-03-01 14:38:56 -08:00
Junio C Hamano
fd6e3cdaea Merge branch 'jc/doc-add-placeholder-fix' into HEAD
Practice the new mark-up rule for <placeholders> with "git add"
documentation page.

* jc/doc-add-placeholder-fix:
  doc: apply the new placeholder rules to git-add documentation
2024-03-01 14:38:55 -08:00
Junio C Hamano
9ce1ca3045 Merge branch 'ja/doc-placeholders-markup-rules' into HEAD
The way placeholders are to be marked-up in documentation have been
specified; use "_<placeholder>_" to typeset the word inside a pair
of <angle-brakets> emphasized.

* ja/doc-placeholders-markup-rules:
  doc: clarify the format of placeholders
2024-03-01 14:38:55 -08:00
Junio C Hamano
510a27e9e4 Merge branch 'ps/reflog-list' into HEAD
"git reflog" learned a "list" subcommand that enumerates known reflogs.

* ps/reflog-list:
  builtin/reflog: introduce subcommand to list reflogs
  refs: stop resolving ref corresponding to reflogs
  refs: drop unused params from the reflog iterator callback
  refs: always treat iterators as ordered
  refs/files: sort merged worktree and common reflogs
  refs/files: sort reflogs returned by the reflog iterator
  dir-iterator: support iteration in sorted order
  dir-iterator: pass name to `prepare_next_entry_data()` directly
2024-03-01 14:38:55 -08:00
Junio C Hamano
221c3daef4 Merge branch 'ds/doc-send-email-capitalization' into HEAD
Doc update.

* ds/doc-send-email-capitalization:
  documentation: send-email: use camel case consistently
2024-03-01 14:38:54 -08:00
Junio C Hamano
af88fbd949 Merge branch 'ja/docfixes' into HEAD
Doc update.

* ja/docfixes:
  doc: end sentences with full-stop
  doc: close unclosed angle-bracket of a placeholder in git-clone doc
  doc: git-rev-parse: enforce command-line description syntax
2024-03-01 14:38:54 -08:00
Junio C Hamano
90c0c15e56 Merge branch 'cp/t9146-use-test-path-helpers' into HEAD
Test script clean-up.

* cp/t9146-use-test-path-helpers:
  t9146: replace test -d/-e/-f with appropriate test_path_is_* function
2024-03-01 14:38:54 -08:00
Junio C Hamano
a87469cc99 Merge branch 'ps/difftool-dir-diff-exit-code' into HEAD
"git difftool --dir-diff" learned to honor the "--trust-exit-code"
option; it used to always exit with 0 and signalled success.

* ps/difftool-dir-diff-exit-code:
  git-difftool--helper: honor `--trust-exit-code` with `--dir-diff`
2024-03-01 14:38:54 -08:00
Eugenio Gigante
3223204456 add: use unsigned type for collection of bits
The 'refresh' function in 'builtin/add.c' declares 'flags' as
signed, and passes it as an argument to the 'refresh_index'
function, which though expects an unsigned value.

Since in this case 'flags' represents a bag of bits, whose MSB is
not used in special ways, change the type of 'flags' to unsigned.

Signed-off-by: Eugenio Gigante <giganteeugenio2@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 11:52:42 -08:00
Jeff King
a922bfa3b5 upload-pack: only accept packfile-uris if we advertised it
Clients are only supposed to request particular capabilities or features
if the server advertised them. For the "packfile-uris" feature, we only
advertise it if uploadpack.blobpacfileuri is set, but we always accept a
request from the client regardless.

In practice this doesn't really hurt anything, as we'd pass the client's
protocol list on to pack-objects, which ends up ignoring it. But we
should try to follow the protocol spec, and tightening this up may catch
buggy or misbehaving clients more easily.

Thanks to recent refactoring, we can hoist the config check from
upload_pack_advertise() into upload_pack_config(). Note the subtle
handling of a value-less bool (which does not count for triggering an
advertisement).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-29 08:10:42 -08:00
Jeff King
9a7b22959a upload-pack: use existing config mechanism for advertisement
When serving a v2 capabilities request, we call upload_pack_advertise()
to tell us the set of features we can advertise to the client. That
involves looking at various config options, all of which need to be kept
in sync with the rules we use in upload_pack_config to set flags like
allow_filter, allow_sideband_all, and so on. If these two pieces of code
get out of sync then we may refuse to respect a capability we
advertised, or vice versa accept one that we should not.

Instead, let's call the same config helper that we'll use for processing
the actual client request, and then just pick the values out of the
resulting struct. This is only a little bit shorter than the current
code, but we don't repeat any policy logic (e.g., we don't have to worry
about the magic sideband-all environment variable here anymore).

And this reveals a gap in the existing code: there is no struct flag for
the packfile-uris capability (we accept it even if it is not advertised,
which we should not). We'll leave the advertisement code for now and
deal with it in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 15:30:41 -08:00
Jeff King
37aa89b068 upload-pack: centralize setup of sideband-all config
We read uploadpack.allowsidebandall to set a matching flag in our
upload_pack_data struct. But for our tests, we also respect
GIT_TEST_SIDEBAND_ALL from the environment, and anybody looking at the
flag in the struct needs to remember to check both. There's only one
such piece of code now, but we're about to add another.

So let's have the config step actually fold the environment value into
the struct, letting the rest of the code use the flag in the obvious
way.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 15:30:41 -08:00
Jeff King
922fdefb84 upload-pack: use repository struct to get config
Our upload_pack_v2() function gets a repository struct, but we ignore it
totally.  In practice this doesn't cause any problems, as it will never
differ from the_repository. But in the spirit of taking a small step
towards getting rid of the_repository, let's at least starting using it
to grab config. There are probably other spots that could benefit, but
it's a start.

Note that we don't need to pass the repo for protected_config(); the
whole point there is that we are not looking at repo config, so there is
no repo-specific version of the function.

For the v0 version of the protocol, we're not passed a repository
struct, so we'll continue to use the_repository there.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 14:48:35 -08:00
Jeff King
6cd05e768b upload-pack: free tree buffers after parsing
When a client sends us a "want" or "have" line, we call parse_object()
to get an object struct. If the object is a tree, then the parsed state
means that tree->buffer points to the uncompressed contents of the tree.
But we don't really care about it. We only really need to parse commits
and tags; for trees and blobs, the important output is just a "struct
object" with the correct type.

But much worse, we do not ever free that tree buffer. It's not leaked in
the traditional sense, in that we still have a pointer to it from the
global object hash. But if the client requests many trees, we'll hold
all of their contents in memory at the same time.

Nobody really noticed because it's rare for clients to directly request
a tree. It might happen for a lightweight tag pointing straight at a
tree, or it might happen for a "tree:depth" partial clone filling in
missing trees.

But it's also possible for a malicious client to request a lot of trees,
causing upload-pack's memory to balloon. For example, without this
patch, requesting every tree in git.git like:

  pktline() {
    local msg="$*"
    printf "%04x%s\n" $((1+4+${#msg})) "$msg"
  }

  want_trees() {
    pktline command=fetch
    printf 0001
    git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
      while read oid type; do
        test "$type" = "tree" || continue
        pktline want $oid
      done
      pktline done
      printf 0000
  }

  want_trees | GIT_PROTOCOL=version=2 valgrind --tool=massif ./git upload-pack . >/dev/null

shows a peak heap usage of ~3.7GB. Which is just about the sum of the
sizes of all of the uncompressed trees. For linux.git, it's closer to
17GB.

So the obvious thing to do is to call free_tree_buffer() after we
realize that we've parsed a tree. We know that upload-pack won't need it
later. But let's push the logic into parse_object_with_flags(), telling
it to discard the tree buffer immediately. There are two reasons for
this. One, all of the relevant call-sites already call the with_options
variant to pass the SKIP_HASH flag. So it actually ends up as less code
than manually free-ing in each spot. And two, it enables an extra
optimization that I'll discuss below.

I've touched all of the sites that currently use SKIP_HASH in
upload-pack. That drops the peak heap of the upload-pack invocation
above from 3.7GB to ~24MB.

I've also modified the caller in get_reference(); a partial clone
benefits from its use in pack-objects for the reasons given in
0bc2557951 (upload-pack: skip parse-object re-hashing of "want" objects,
2022-09-06), where we were measuring blob requests. But note that the
results of get_reference() are used for traversing, as well; so we
really would _eventually_ use the tree contents. That makes this at
first glance a space/time tradeoff: we won't hold all of the trees in
memory at once, but we'll have to reload them each when it comes time to
traverse.

And here's where our extra optimization comes in. If the caller is not
going to immediately look at the tree contents, and it doesn't care
about checking the hash, then parse_object() can simply skip loading the
tree entirely, just like we do for blobs! And now it's not a space/time
tradeoff in get_reference() anymore. It's just a lazy-load: we're
delaying reading the tree contents until it's time to actually traverse
them one by one.

And of course for upload-pack, this optimization means we never load the
trees at all, saving lots of CPU time. Timing the "every tree from
git.git" request above shows upload-pack dropping from 32 seconds of CPU
to 19 (the remainder is mostly due to pack-objects actually sending the
pack; timing just the upload-pack portion shows we go from 13s to
~0.28s).

These are all highly gamed numbers, of course. For real-world
partial-clone requests we're saving only a small bit of time in
practice. But it does help harden upload-pack against malicious
denial-of-service attacks.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 14:42:01 -08:00
Jeff King
a6ca601cdf upload-pack: use PARSE_OBJECT_SKIP_HASH_CHECK in more places
In commit 0bc2557951 (upload-pack: skip parse-object re-hashing of
"want" objects, 2022-09-06), we optimized the parse_object() calls for
v2 "want" lines from the client so that they avoided parsing blobs, and
so that they used the commit-graph rather than parsing commit objects
from scratch.

We should extend that to two other spots:

  1. We parse "have" objects in the got_oid() function. These won't
     generally be non-commits (unlike "want" lines from a partial
     clone). But we still benefit from the use of the commit-graph.

  2. For v0, the "want" lines are parsed in receive_needs(). These are
     also less likely to be non-commits because by default they have to
     be ref tips. There are config options you might set to allow
     non-tip objects, but you'd mostly do so to support partial clones,
     and clients recent enough to support partial clone will generally
     speak v2 anyway.

So I don't expect this change to improve performance much for day-to-day
operations. But both are possible denial-of-service vectors, where an
attacker can waste our time by sending over a large number of objects to
parse (of course we may waste even more time serving a pack to them, but
we try as much as possible to optimize that in pack-objects; we should
do what we can here in upload-pack, too).

With this patch, running p5600 with GIT_TEST_PROTOCOL_VERSION=0 shows
similar results to what we saw in 0bc2557951 (which ran with the v2
protocol by default). Here are the numbers for linux.git:

  Test                          HEAD^                 HEAD
  -----------------------------------------------------------------------------
  5600.3: checkout of result    50.91(87.95+2.93)     41.75(79.00+3.18) -18.0%

Or for a more extreme (and malicious) case, we can claim to "have" every
blob in git.git over the v0 protocol:

  $ {
      echo "0032want $(git rev-parse HEAD)"
      printf 0000
      git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
      perl -alne 'print "0032have $F[0]" if $F[1] eq "blob"'
    } >input

  $ time ./git.old upload-pack . <input >/dev/null
  real	0m52.951s
  user	0m51.633s
  sys	0m1.304s

  $ time ./git.new upload-pack . <input >/dev/null
  real	0m0.261s
  user	0m0.156s
  sys	0m0.105s

(Note that these don't actually compute a pack because of the hacky
protocol usage, so those numbers are representing the raw blob-parsing
effort done by upload-pack).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 14:42:01 -08:00
Jeff King
5f64279443 upload-pack: always turn off save_commit_buffer
When the client sends us "want $oid" lines, we call parse_object($oid)
to get an object struct. It's important to parse the commits because we
need to traverse them in the negotiation phase. But of course we don't
need to hold on to the commit messages for each one.

We've turned off the save_commit_buffer flag in get_common_commits() for
a long time, since f0243f26f6 (git-upload-pack: More efficient usage of
the has_sha1 array, 2005-10-28). That helps with the commits we see
while actually traversing. But:

  1. That function is only used by the v0 protocol. I think the v2
     protocol's code path leaves the flag on (and thus pays the extra
     memory penalty), though I didn't measure it specifically.

  2. If the client sends us a bunch of "want" lines, that happens before
     the negotiation phase. So we'll hold on to all of those commit
     messages. Generally the number of "want" lines scales with the
     refs, not with the number of objects in the repo. But a malicious
     client could send a lot in order to waste memory.

As an example of (2), if I generate a request to fetch all commits in
git.git like this:

  pktline() {
    local msg="$*"
    printf "%04x%s\n" $((1+4+${#msg})) "$msg"
  }

  want_commits() {
    pktline command=fetch
    printf 0001
    git cat-file --batch-all-objects --batch-check='%(objectname) %(objecttype)' |
      while read oid type; do
        test "$type" = "commit" || continue
        pktline want $oid
      done
      pktline done
      printf 0000
  }

  want_commits | GIT_PROTOCOL=version=2 valgrind --tool=massif git-upload-pack . >/dev/null

before this patch upload-pack peaks at ~125MB, and after at ~35MB. The
difference is not coincidentally about the same as the sum of all commit
object sizes as computed by:

  git cat-file --batch-all-objects --batch-check='%(objecttype) %(objectsize)' |
  perl -alne '$v += $F[1] if $F[0] eq "commit"; END { print $v }'

In a larger repository like linux.git, that number is ~1GB.

In a repository with a full commit-graph file this will have no impact
(and the commit graph would save us from parsing at all, so is a much
better solution!). But it's easy to do, might help a little in
real-world cases (where even if you have a commit graph it might not be
fully up to date), and helps a lot for a worst-case malicious request.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 14:42:01 -08:00
Taylor Blau
8c735b11de upload-pack: disallow object-info capability by default
We added an "object-info" capability to the v2 upload-pack protocol in
a2ba162cda (object-info: support for retrieving object info,
2021-04-20). In the almost 3 years since, we have not added any
client-side support, and it does not appear to exist in other
implementations either (JGit understands the verb on the server side,
but not on the client side).

Since this largely unused code is accessible over the network by
default, it increases the attack surface of upload-pack. I don't know of
any particularly severe problem, but one issue is that because of the
request/response nature of the v2 protocol, it will happily read an
unbounded number of packets, adding each one to a string list (without
regard to whether they are objects we know about, duplicates, etc).

This may be something we want to improve in the long run, but in the
short term it makes sense to disable the feature entirely. We'll add a
config option as an escape hatch for anybody who wants to develop the
feature further.

A more gentle option would be to add the config option to let people
disable it manually, but leave it enabled by default. But given that
there's no client side support, that seems like the wrong balance with
security.

Disabling by default will slow adoption a bit once client-side support
does become available (there were some patches[1] in 2022, but nothing
got merged and there's been nothing since). But clients have to deal
with older servers that do not understand the option anyway (and the
capability system handles that), so it will just be a matter of servers
flipping their config at that point (and hopefully once any unbounded
allocations have been addressed).

[jk: this is a patch that GitHub has been running for several years, but
     rebased forward and with a new commit message for upstream]

[1] https://lore.kernel.org/git/20220208231911.725273-1-calvinwan@google.com/

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 14:42:01 -08:00
Jeff King
179776f9e6 upload-pack: accept only a single packfile-uri line
When we see a packfile-uri line from the client, we use
string_list_split() to split it on commas and store the result in a
string_list.  A single packfile-uri line is therefore limited to storing
~64kb, the size of a pkt-line.

But we'll happily accept multiple such lines, and each line appends to
the string list, growing without bound.

In theory this could be useful, making:

  0017packfile-uris http
  0018packfile-uris https

equivalent to:

  001dpackfile-uris http,https

But the protocol documentation doesn't indicate that this should work
(and indeed, refers to this in the singular as "the following argument
can be included in the client's request"). And the client-side
implementation in fetch-pack has always sent a single line (JGit appears
to understand the line on the server side but has no client-side
implementation, and libgit2 understands neither).

If we were worried about compatibility, we could instead just put a
limit on the maximum number of values we'd accept. The current client
implementation limits itself to only two values: "http" and "https", so
something like "256" would be more than enough. But accepting only a
single line seems more in line with the protocol documentation, and
matches other parts of the protocol (e.g., we will not accept a second
"filter" line).

We'll also make this more explicit in the protocol documentation; as
above, I think this was always the intent, but there's no harm in making
it clear.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 14:42:01 -08:00
Jeff King
b065063c57 upload-pack: use a strmap for want-ref lines
When the "ref-in-want" capability is advertised (which it is not by
default), then upload-pack processes a "want-ref" line from the client
by checking that the name is a valid ref and recording it in a
string-list.

In theory this list should grow no larger than the number of refs in the
server-side repository. But since we don't do any de-duplication, a
client which sends "want-ref refs/heads/foo" over and over will cause
the array to grow without bound.

We can fix this by switching to strmap, which efficiently detects
duplicates. There are two client-visible changes here:

  1. The "wanted-refs" response will now be in an apparently-random
     order (based on iterating the hashmap) rather than the order given
     by the client. The protocol documentation is quiet on ordering
     here. The current fetch-pack implementation is happy with any
     order, as it looks up each returned ref using a binary search in
     its local sorted list. JGit seems to implement want-ref on the
     server side, but has no client-side support. libgit2 doesn't
     support either side.

     It would obviously be possible to record the original order or to
     use the strmap as an auxiliary data structure. But if the client
     doesn't care, we may as well do the simplest thing.

  2. We'll now reject duplicates explicitly as a protocol error. The
     client should never send them (and our current implementation, even
     when asked to "git fetch master:one master:two" will de-dup on the
     client side).

     If we wanted to be more forgiving, we could perhaps just throw away
     the duplicates. But then our "wanted-refs" response back to the
     client would omit the duplicates, and it's hard to say what a
     client that accidentally sent a duplicate would do with that. So I
     think we're better off to complain loudly before anybody
     accidentally writes such a client.

Let's also add a note to the protocol documentation clarifying that
duplicates are forbidden. As discussed above, this was already the
intent, but it's not very explicit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-28 14:42:01 -08:00