1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-26 05:56:11 +02:00
Commit Graph

273 Commits

Author SHA1 Message Date
Jonathan Tan ec6a8f9705 index-pack: make get_base_data() comment clearer
A comment mentions that we may free cached delta bases via
find_unresolved_deltas(), but that function went away in f08cbf60fe
(index-pack: make quantum of work smaller, 2020-09-08). Since we need to
rewrite that comment anyway, make the entire comment clearer.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 13:32:27 -07:00
Jeff King bebe171947 index-pack: drop type_cas mutex
The type_cas lock lost all of its callers in f08cbf60fe (index-pack:
make quantum of work smaller, 2020-09-08), so we can safely delete it.
The compiler didn't alert us that the variable became unused, because we
still call pthread_mutex_init() and pthread_mutex_destroy() on it.

It's worth considering also whether that commit was in error to remove
the use of the lock. Why don't we need it now, if we did before, as
described in ab791dd138 (index-pack: fix race condition with duplicate
bases, 2014-08-29)? I think the answer is that we now look at and assign
the child_obj->real_type field in the main thread while holding the
work_lock(). So we don't have to worry about racing with the worker
threads.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 11:51:26 -07:00
Jeff King cea69151a4 index-pack: restore "resolving deltas" progress meter
Commit f08cbf60fe (index-pack: make quantum of work smaller, 2020-09-08)
refactored the main loop in threaded_second_pass(), but also deleted the
call to display_progress() at the top of the loop. This means that users
typically see no progress at all during the delta resolution phase (and
for large repositories, Git appears to hang).

This looks like an accident that was unrelated to the intended change of
that commit, since we continue to update nr_resolved_deltas in
resolve_delta(). Let's restore the call to get that progress back.

We'll also add a test that confirms we generate the expected progress.
This isn't perfect, as it wouldn't catch a bug where progress was
delayed to the end. That was probably possible to trigger when receiving
a thin pack, because we'd eventually call display_progress() from
fix_unresolved_deltas(), but only once after doing all the work.
However, since our test case generates a complete pack, it reliably
demonstrates this particular bug and its fix. And we can't do better
without making the test racy.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-07 11:50:09 -07:00
Junio C Hamano b7e65b51e5 Merge branch 'jt/threaded-index-pack'
"git index-pack" learned to resolve deltified objects with greater
parallelism.

* jt/threaded-index-pack:
  index-pack: make quantum of work smaller
  index-pack: make resolve_delta() assume base data
  index-pack: calculate {ref,ofs}_{first,last} early
  index-pack: remove redundant child field
  index-pack: unify threaded and unthreaded code
  index-pack: remove redundant parameter
  Documentation: deltaBaseCacheLimit is per-thread
2020-09-22 12:36:28 -07:00
Jonathan Tan f08cbf60fe index-pack: make quantum of work smaller
Currently, when index-pack resolves deltas, it does not split up delta
trees into threads: each delta base root (an object that is not a
REF_DELTA or OFS_DELTA) can go into its own thread, but all deltas on
that root (direct or indirect) are processed in the same thread.

This is a problem when a repository contains a large text file (thus,
delta-able) that is modified many times - delta resolution time during
fetching is dominated by processing the deltas corresponding to that
text file.

This patch contains a solution to that. When cloning using

  git -c core.deltabasecachelimit=1g clone \
    https://fuchsia.googlesource.com/third_party/vulkan-cts

on my laptop, clone time improved from 3m2s to 2m5s (using 3 threads,
which is the default).

The solution is to have a global work stack. This stack contains delta
bases (objects, whether appearing directly in the packfile or generated
by delta resolution, that themselves have delta children) that need to
be processed; whenever a thread needs work, it peeks at the top of the
stack and processes its next unprocessed child. If a thread finds the
stack empty, it will look for more delta base roots to push on the stack
instead.

The main weakness of having a global work stack is that more time is
spent in the mutex, but profiling has shown that most time is spent in
the resolution of the deltas themselves, so this shouldn't be an issue
in practice. In any case, experimentation (as described in the clone
command above) shows that this patch is a net improvement.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-08 15:52:17 -07:00
Jonathan Tan ee6f058384 index-pack: make resolve_delta() assume base data
A subsequent commit will make the quantum of work smaller, necessitating
more locking. This commit allows resolve_delta() to be called outside
the lock.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24 14:14:52 -07:00
Jonathan Tan b4718cae51 index-pack: calculate {ref,ofs}_{first,last} early
This is refactoring 2 of 2 to simplify struct base_data.

Whenever we make a struct base_data, immediately calculate its delta
children. This eliminates confusion as to when the
{ref,ofs}_{first,last} fields are initialized.

Before this patch, the delta children were calculated at the last
possible moment. This allowed the members of struct base_data to be
populated in any order, superficially useful when we have the object
contents before the struct object_entry. But this makes reasoning about
the state of struct base_data more complicated, hence this patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24 14:12:58 -07:00
Jonathan Tan a7f7e84a49 index-pack: remove redundant child field
This is refactoring 1 of 2 to simplify struct base_data.

In index-pack, each thread maintains a doubly-linked list of the delta
chain that it is currently processing (the "base" and "child" pointers
in struct base_data). When a thread exceeds the delta base cache limit
and needs to reclaim memory, it uses the "child" pointers to traverse
the lineage, reclaiming the memory of the eldest delta bases first.

A subsequent patch will perform memory reclaiming in a different way and
will thus no longer need the "child" pointer. Because the "child"
pointer is redundant even now, remove it so that the aforementioned
subsequent patch will be clearer. In the meantime, reclaim memory in the
reverse order of the "base" pointers.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24 14:11:14 -07:00
Jonathan Tan 46e6fb1e44 index-pack: unify threaded and unthreaded code
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24 14:02:31 -07:00
Jonathan Tan fc968e26c2 index-pack: remove redundant parameter
find_{ref,ofs}_delta_{,children} take an enum object_type parameter, but
the object type is already present in the name of the function. Remove
that parameter from these functions.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-24 13:55:57 -07:00
Jeff King fbff95b67f index-pack: adjust default threading cap
Commit b8a2486f15 (index-pack: support multithreaded delta resolving,
2012-05-06) describes an experiment that shows that setting the number
of threads for index-pack higher than 3 does not help.

I repeated that experiment using a more modern version of Git and a more
modern CPU and got different results.

Here are timings for p5302 against linux.git run on my laptop, a Core
i9-9880H with 8 cores plus hyperthreading (so online-cpus returns 16):

  5302.3: index-pack 0 threads                   256.28(253.41+2.79)
  5302.4: index-pack 1 threads                   257.03(254.03+2.91)
  5302.5: index-pack 2 threads                   149.39(268.34+3.06)
  5302.6: index-pack 4 threads                   94.96(294.10+3.23)
  5302.7: index-pack 8 threads                   68.12(339.26+3.89)
  5302.8: index-pack 16 threads                  70.90(655.03+7.21)
  5302.9: index-pack default number of threads   116.91(290.05+3.21)

You can see that wall-clock times continue to improve dramatically up to
the number of cores, but bumping beyond that (into hyperthreading
territory) does not help (and in fact hurts a little).

Here's the same experiment on a machine with dual Xeon 6230's, totaling
40 cores (80 with hyperthreading):

  5302.3: index-pack 0 threads                    310.04(302.73+6.90)
  5302.4: index-pack 1 threads                    310.55(302.68+7.40)
  5302.5: index-pack 2 threads                    178.17(304.89+8.20)
  5302.6: index-pack 5 threads                    99.53(315.54+9.56)
  5302.7: index-pack 10 threads                   72.80(327.37+12.79)
  5302.8: index-pack 20 threads                   60.68(357.74+21.66)
  5302.9: index-pack 40 threads                   58.07(454.44+67.96)
  5302.10: index-pack 80 threads                  59.81(720.45+334.52)
  5302.11: index-pack default number of threads   134.18(309.32+7.98)

The results are similar; things stop improving at 40 threads. Curiously,
going from 20 to 40 really doesn't help much, either (and increases CPU
time considerably). So that may represent an actual barrier to
parallelism, where we lose out due to context-switching and loss of
cache locality, but don't reap the wall-clock benefits due to contention
of our coarse-grained locks.

So what's a good default value? It's clear that the current cap of 3 is
too low; our default values are 42% and 57% slower than the best times
on each machine. The results on the 40-core machine imply that 20
threads is an actual barrier regardless of the number of cores, so we'll
take that as a maximum. We get the best results on these machines at
half of the online-cpus value. That's presumably a result of the
hyperthreading. That's common on multi-core Intel processors, but not
necessarily elsewhere. But if we take it as an assumption, we can
perform optimally on hyperthreaded machines and still do much better
than the status quo on other machines, as long as we never half below
the current value of 3.

So that's what this patch does.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21 12:02:36 -07:00
brian m. carlson 586740aa6e builtin/index-pack: add option to specify hash algorithm
git index-pack is usually run in a repository, but need not be. Since
packs don't contains information on the algorithm in use, instead
relying on context, add an option to index-pack to tell it which one
we're using in case someone runs it outside of a repository.  Since
using --stdin necessarily implies a repository, don't allow specifying
an object format if it's provided to prevent users from passing an
option that won't work.  Add documentation for this option.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-19 14:04:08 -07:00
brian m. carlson 629dffc461 packfile: compute and use the index CRC offset
Both v2 pack index files and the v3 format specified as part of the
NewHash work have similar data starting at the CRC table.  Much of the
existing code wants to read either this table or the offset entries
following it, and in doing so computes the offset each time.

In order to share as much code between v2 and v3, compute the offset of
the CRC table and store it when the pack is opened.  Use this value to
compute offsets to not only the CRC table, but to the offset entries
beyond it.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-27 10:07:07 -07:00
Jonathan Tan db7ed7418b promisor-remote: accept 0 as oid_nr in function
There are 3 callers to promisor_remote_get_direct() that first check if
the number of objects to be fetched is equal to 0. Fold that check into
promisor_remote_get_direct(), and in doing so, be explicit as to what
promisor_remote_get_direct() does if oid_nr is 0 (it returns 0, success,
immediately).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-04-02 12:42:32 -07:00
Junio C Hamano 7b029ebaef Merge branch 'jk/index-pack-dupfix'
The index-pack code now diagnoses a bad input packstream that
records the same object twice when it is used as delta base; the
code used to declare a software bug when encountering such an
input, but it is an input error.

* jk/index-pack-dupfix:
  index-pack: downgrade twice-resolved REF_DELTA to die()
2020-02-14 12:54:24 -08:00
Jeff King a21781011f index-pack: downgrade twice-resolved REF_DELTA to die()
When we're resolving a REF_DELTA, we compare-and-swap its type from
REF_DELTA to whatever real type the base object has, as discussed in
ab791dd138 (index-pack: fix race condition with duplicate bases,
2014-08-29). If the old type wasn't a REF_DELTA, we consider that a
BUG(). But as discussed in that commit, we might see this case whenever
we try to resolve an object twice, which may happen because we have
multiple copies of the base object.

So this isn't a bug at all, but rather a sign that the input pack is
broken. And indeed, this case is triggered already in t5309.5 and
t5309.6, which create packs with delta cycles and duplicate bases. But
we never noticed because those tests are marked expect_failure.

Those tests were added by b2ef3d9ebb (test index-pack on packs with
recoverable delta cycles, 2013-08-23), which was leaving the door open
for cases that we theoretically _could_ handle. And when we see an
already-resolved object like this, in theory we could keep going after
confirming that the previously resolved child->real_type matches
base->obj->real_type. But:

  - enforcing the "only resolve once" rule here saves us from an
    infinite loop in other parts of the code. If we keep going, then the
    delta cycle in t5309.5 causes us to loop infinitely, as
    find_ref_delta_children() doesn't realize which objects have already
    been resolved. So there would be more changes needed to make this
    case work, and in the meantime we'd be worse off.

  - any pack that triggers this is broken anyway. It either has a
    duplicate base object, or it has a cycle which causes us to bring in
    a duplicate via --fix-thin. In either case, we'd end up rejecting
    the pack in write_idx_file(), which also detects duplicates.

So the tests have little value in documenting what we _could_ be doing
(and have been neglected for 6+ years). Let's switch them to confirming
that we handle this case cleanly (and switch out the BUG() for a more
informative die() so that we do so).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-04 13:19:11 -08:00
Matheus Tavares b98d188581 sha1-file: allow check_object_signature() to handle any repo
Some callers of check_object_signature() can work on arbitrary
repositories, but the repo does not get passed to this function.
Instead, the_repository is always used internally. To fix possible
inconsistencies, allow the function to receive a struct repository and
make those callers pass on the repo being handled.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Matheus Tavares 2dcde20e1c sha1-file: pass git_hash_algo to hash_object_file()
Allow hash_object_file() to work on arbitrary repos by introducing a
git_hash_algo parameter. Change callers which have a struct repository
pointer in their scope to pass on the git_hash_algo from the said repo.
For all other callers, pass on the_hash_algo, which was already being
used internally at hash_object_file(). This functionality will be used
in the following patch to make check_object_signature() be able to work
on arbitrary repos (which, in turn, will be used to fix an
inconsistency at object.c:parse_object()).

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Matheus Tavares c8123e72f6 streaming: allow open_istream() to handle any repo
Some callers of open_istream() at archive-tar.c and archive-zip.c are
capable of working on arbitrary repositories but the repo struct is not
passed down to open_istream(), which uses the_repository internally. For
now, that's not a problem since the said callers are only being called
with the_repository. But to be consistent and avoid future problems,
let's allow open_istream() to receive a struct repository and use that
instead of the_repository. This parameter addition will also be used in
a future patch to make sha1-file.c:check_object_signature() be able to
work on arbitrary repos.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Junio C Hamano 676278f8ea Merge branch 'bc/object-id-part17'
Preparation for SHA-256 upgrade continues.

* bc/object-id-part17: (26 commits)
  midx: switch to using the_hash_algo
  builtin/show-index: replace sha1_to_hex
  rerere: replace sha1_to_hex
  builtin/receive-pack: replace sha1_to_hex
  builtin/index-pack: replace sha1_to_hex
  packfile: replace sha1_to_hex
  wt-status: convert struct wt_status to object_id
  cache: remove null_sha1
  builtin/worktree: switch null_sha1 to null_oid
  builtin/repack: write object IDs of the proper length
  pack-write: use hash_to_hex when writing checksums
  sequencer: convert to use the_hash_algo
  bisect: switch to using the_hash_algo
  sha1-lookup: switch hard-coded constants to the_hash_algo
  config: use the_hash_algo in abbrev comparison
  combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo
  bundle: switch to use the_hash_algo
  connected: switch GIT_SHA1_HEXSZ to the_hash_algo
  show-index: switch hard-coded constants to the_hash_algo
  blame: remove needless comparison with GIT_SHA1_HEXSZ
  ...
2019-10-11 14:24:46 +09:00
brian m. carlson 69fa337060 builtin/index-pack: replace sha1_to_hex
Since sha1_to_hex is limited to SHA-1, replace it with hash_to_hex so
this code works with other algorithms.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-08-19 15:04:59 -07:00
Christian Couder b14ed5adaf Use promisor_remote_get_direct() and has_promisor_remote()
Instead of using the repository_format_partial_clone global
and fetch_objects() directly, let's use has_promisor_remote()
and promisor_remote_get_direct().

This way all the configured promisor remotes will be taken
into account, not only the one specified by
extensions.partialClone.

Also when cloning or fetching using a partial clone filter,
remote.origin.promisor will be set to "true" instead of
setting extensions.partialClone to "origin". This makes it
possible to use many promisor remote just by fetching from
them.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25 14:05:37 -07:00
Jonathan Tan 8a30a1efd1 index-pack: prefetch missing REF_DELTA bases
When fetching, the client sends "have" commit IDs indicating that the
server does not need to send any object referenced by those commits,
reducing network I/O. When the client is a partial clone, the client
still sends "have"s in this way, even if it does not have every object
referenced by a commit it sent as "have".

If a server omits such an object, it is fine: the client could lazily
fetch that object before this fetch, and it can still do so after.

The issue is when the server sends a thin pack containing an object that
is a REF_DELTA against such a missing object: index-pack fails to fix
the thin pack. When support for lazily fetching missing objects was
added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
objects", 2017-12-08), support in index-pack was turned off in the
belief that it accesses the repo only to do hash collision checks.
However, this is not true: it also needs to access the repo to resolve
REF_DELTA bases.

Support for lazy fetching should still generally be turned off in
index-pack because it is used as part of the lazy fetching process
itself (if not, infinite loops may occur), but we do need to fetch the
REF_DELTA bases. (When fetching REF_DELTA bases, it is unlikely that
those are REF_DELTA themselves, because we do not send "have" when
making such fetches.)

To resolve this, prefetch all missing REF_DELTA bases before attempting
to resolve them. This both ensures that all bases are attempted to be
fetched, and ensures that we make only one request per index-pack
invocation, and not one request per missing object.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-15 11:01:40 +09:00
SZEDER Gábor 79e3aa6624 index-pack: show progress while checking objects
When 'git index-pack' is run by 'git clone', its check_objects()
function usually doesn't take long enough to be a concern, but I just
run into a situation where it took about a minute or so: I
inadvertently put some memory pressure on my tiny laptop while cloning
linux.git, and then there was quite a long silence between the
"Resolving deltas" and "Checking connectivity" progress bars.

Show a progress bar during the loop of check_objects() to let the user
know that something is still going on.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-01 18:08:05 +09:00
Jeff King 98374a07c9 convert has_sha1_file() callers to has_object_file()
The only remaining callers of has_sha1_file() actually have an object_id
already. They can use the "object" variant, rather than dereferencing
the hash themselves.

The code changes here were completely generated by the included
coccinelle patch.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08 09:41:06 -08:00
Junio C Hamano f5f0f68d61 Merge branch 'tb/print-size-t-with-uintmax-format'
Code preparation to replace ulong vars with size_t vars where
appropriate.

* tb/print-size-t-with-uintmax-format:
  Upcast size_t variables to uintmax_t when printing
2018-11-19 16:24:41 +09:00
Torsten Bögershausen ca473cef91 Upcast size_t variables to uintmax_t when printing
When printing variables which contain a size, today "unsigned long"
is used at many places.
In order to be able to change the type from "unsigned long" into size_t
some day in the future, we need to have a way to print 64 bit variables
on a system that has "unsigned long" defined to be 32 bit, like Win64.

Upcast all those variables into uintmax_t before they are printed.
This is to prepare for a bigger change, when "unsigned long"
will be converted into size_t for variables which may be > 4Gib.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 16:43:52 +09:00
Nguyễn Thái Ngọc Duy 2094c5e582 index-pack: remove #ifdef NO_PTHREADS
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05 13:42:11 +09:00
Jeff King 67947c34ae convert "hashcmp() != 0" to "!hasheq()"
This rounds out the previous three patches, covering the
inequality logic for the "hash" variant of the functions.

As with the previous three, the accompanying code changes
are the mechanical result of applying the coccinelle patch;
see those patches for more discussion.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29 11:32:49 -07:00
Jeff King 4a7e27e957 convert "oidcmp() == 0" to oideq()
Using the more restrictive oideq() should, in the long run,
give the compiler more opportunities to optimize these
callsites. For now, this conversion should be a complete
noop with respect to the generated code.

The result is also perhaps a little more readable, as it
avoids the "zero is equal" idiom. Since it's so prevalent in
C, I think seasoned programmers tend not to even notice it
anymore, but it can sometimes make for awkward double
negations (e.g., we can drop a few !!oidcmp() instances
here).

This patch was generated almost entirely by the included
coccinelle patch. This mechanical conversion should be
completely safe, because we check explicitly for cases where
oidcmp() is compared to 0, which is what oideq() is doing
under the hood. Note that we don't have to catch "!oidcmp()"
separately; coccinelle's standard isomorphisms make sure the
two are treated equivalently.

I say "almost" because I did hand-edit the coccinelle output
to fix up a few style violations (it mostly keeps the
original formatting, but sometimes unwraps long lines).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-29 11:32:49 -07:00
Junio C Hamano 1689c22c1c Merge branch 'jk/core-use-replace-refs'
A new configuration variable core.usereplacerefs has been added,
primarily to help server installations that want to ignore the
replace mechanism altogether.

* jk/core-use-replace-refs:
  add core.usereplacerefs config option
  check_replace_refs: rename to read_replace_refs
  check_replace_refs: fix outdated comment
2018-08-15 15:08:23 -07:00
Jeff King 6ebd1cafe2 check_replace_refs: rename to read_replace_refs
This was added as a NEEDSWORK in c3c36d7de2 (replace-object:
check_replace_refs is safe in multi repo environment, 2018-04-11),
waiting for a calmer period. Since doing so now doesn't conflict
with anything in 'pu', it seems as good a time as any.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-18 15:45:14 -07:00
Stefan Beller da14a7ff99 blob: add repository argument to lookup_blob
Add a repository argument to allow the callers of lookup_blob
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Stefan Beller 1ec5bfd24e object: add repository argument to parse_object_buffer
Add a repository argument to allow the callers of parse_object_buffer
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Junio C Hamano 549ca8aa7c Merge branch 'jk/index-pack-maint'
"index-pack --strict" has been taught to make sure that it runs the
final object integrity checks after making the freshly indexed
packfile available to itself.

* jk/index-pack-maint:
  index-pack: correct install_packed_git() args
  index-pack: handle --strict checks of non-repo packs
  prepare_commit_graft: treat non-repository as a noop
2018-06-13 12:50:46 -07:00
Junio C Hamano 3737746120 index-pack: correct install_packed_git() args
The function does not start taking the repository object as a
parameter before v2.18 track.  Make the topic mergeable to v2.17
maintenance track by dropping it.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-11 15:09:18 -07:00
Jeff King 368b4e5906 index-pack: handle --strict checks of non-repo packs
Commit 73c3f0f704 (index-pack: check .gitmodules files with
--strict, 2018-05-04) added a call to add_packed_git(), with
the intent that the newly-indexed objects would be available
to the process when we run fsck_finish().  But that's not
what add_packed_git() does. It only allocates the struct,
and you must install_packed_git() on the result. So that
call was effectively doing nothing (except leaking a
struct).

But wait, we passed all of the tests! Does that mean we
don't need the call at all?

For normal cases, no. When we run "index-pack --stdin"
inside a repository, we write the new pack into the object
directory. If fsck_finish() needs to access one of the new
objects, then our initial lookup will fail to find it, but
we'll follow up by running reprepare_packed_git() and
looking again. That logic was meant to handle somebody else
repacking simultaneously, but it ends up working for us
here.

But there is a case that does need this, that we were not
testing. You can run "git index-pack foo.pack" on any file,
even when it is not inside the object directory. Or you may
not even be in a repository at all! This case fails without
doing the proper install_packed_git() call.

We can make this work by adding the install call.

Note that we should be prepared to handle add_packed_git()
failing. We can just silently ignore this case, though. If
fsck_finish() later needs the objects and they're not
available, it will complain itself. And if it doesn't
(because we were able to resolve the whole fsck in the first
pass), then it actually isn't an interesting error at all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-01 11:48:56 +09:00
Junio C Hamano 42c8ce1c49 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (42 commits)
  merge-one-file: compute empty blob object ID
  add--interactive: compute the empty tree value
  Update shell scripts to compute empty tree object ID
  sha1_file: only expose empty object constants through git_hash_algo
  dir: use the_hash_algo for empty blob object ID
  sequencer: use the_hash_algo for empty tree object ID
  cache-tree: use is_empty_tree_oid
  sha1_file: convert cached object code to struct object_id
  builtin/reset: convert use of EMPTY_TREE_SHA1_BIN
  builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX
  wt-status: convert two uses of EMPTY_TREE_SHA1_HEX
  submodule: convert several uses of EMPTY_TREE_SHA1_HEX
  sequencer: convert one use of EMPTY_TREE_SHA1_HEX
  merge: convert empty tree constant to the_hash_algo
  builtin/merge: switch tree functions to use object_id
  builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo
  sha1-file: add functions for hex empty tree and blob OIDs
  builtin/receive-pack: avoid hard-coded constants for push certs
  diff: specify abbreviation size in terms of the_hash_algo
  upload-pack: replace use of several hard-coded constants
  ...
2018-05-30 14:04:10 +09:00
Junio C Hamano 50f08db594 Merge branch 'js/use-bug-macro'
Developer support update, by using BUG() macro instead of die() to
mark codepaths that should not happen more clearly.

* js/use-bug-macro:
  BUG_exit_code: fix sparse "symbol not declared" warning
  Convert remaining die*(BUG) messages
  Replace all die("BUG: ...") calls by BUG() ones
  run-command: use BUG() to report bugs, not die()
  test-tool: help verifying BUG() code paths
2018-05-30 14:04:07 +09:00
Junio C Hamano 7913f53b56 Sync with Git 2.17.1
* maint: (25 commits)
  Git 2.17.1
  Git 2.16.4
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  fsck: complain when .gitmodules is a symlink
  index-pack: check .gitmodules files with --strict
  unpack-objects: call fsck_finish() after fscking objects
  fsck: call fsck_finish() after fscking objects
  fsck: check .gitmodules content
  fsck: handle promisor objects in .gitmodules check
  fsck: detect gitmodules files
  fsck: actually fsck blob data
  fsck: simplify ".git" check
  index-pack: make fsck error message more specific
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  ...
2018-05-29 17:10:05 +09:00
Junio C Hamano fcb6df3254 Merge branch 'sb/oid-object-info'
The codepath around object-info API has been taught to take the
repository object (which in turn tells the API which object store
the objects are to be located).

* sb/oid-object-info:
  cache.h: allow oid_object_info to handle arbitrary repositories
  packfile: add repository argument to cache_or_unpack_entry
  packfile: add repository argument to unpack_entry
  packfile: add repository argument to read_object
  packfile: add repository argument to packed_object_info
  packfile: add repository argument to packed_to_object_type
  packfile: add repository argument to retry_bad_packed_offset
  cache.h: add repository argument to oid_object_info
  cache.h: add repository argument to oid_object_info_extended
2018-05-23 14:38:16 +09:00
Jeff King 73c3f0f704 index-pack: check .gitmodules files with --strict
Now that the internal fsck code has all of the plumbing we
need, we can start checking incoming .gitmodules files.
Naively, it seems like we would just need to add a call to
fsck_finish() after we've processed all of the objects. And
that would be enough to cover the initial test included
here. But there are two extra bits:

  1. We currently don't bother calling fsck_object() at all
     for blobs, since it has traditionally been a noop. We'd
     actually catch these blobs in fsck_finish() at the end,
     but it's more efficient to check them when we already
     have the object loaded in memory.

  2. The second pass done by fsck_finish() needs to access
     the objects, but we're actually indexing the pack in
     this process. In theory we could give the fsck code a
     special callback for accessing the in-pack data, but
     it's actually quite tricky:

       a. We don't have an internal efficient index mapping
	  oids to packfile offsets. We only generate it on
	  the fly as part of writing out the .idx file.

       b. We'd still have to reconstruct deltas, which means
          we'd basically have to replicate all of the
	  reading logic in packfile.c.

     Instead, let's avoid running fsck_finish() until after
     we've written out the .idx file, and then just add it
     to our internal packed_git list.

     This does mean that the objects are "in the repository"
     before we finish our fsck checks. But unpack-objects
     already exhibits this same behavior, and it's an
     acceptable tradeoff here for the same reason: the
     quarantine mechanism means that pushes will be
     fully protected.

In addition to a basic push test in t7415, we add a sneaky
pack that reverses the usual object order in the pack,
requiring that index-pack access the tree and blob during
the "finish" step.

This already works for unpack-objects (since it will have
written out loose objects), but we'll check it with this
sneaky pack for good measure.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King db5a58c1bd index-pack: make fsck error message more specific
If fsck reports an error, we say only "Error in object".
This isn't quite as bad as it might seem, since the fsck
code would have dumped some errors to stderr already. But it
might help to give a little more context. The earlier output
would not have even mentioned "fsck", and that may be a clue
that the "fsck.*" or "*.fsckObjects" config may be relevant.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Junio C Hamano b10edb2df5 Merge branch 'ds/commit-graph'
Precompute and store information necessary for ancestry traversal
in a separate file to optimize graph walking.

* ds/commit-graph:
  commit-graph: implement "--append" option
  commit-graph: build graph from starting commits
  commit-graph: read only from specific pack-indexes
  commit: integrate commit graph with commit parsing
  commit-graph: close under reachability
  commit-graph: add core.commitGraph setting
  commit-graph: implement git commit-graph read
  commit-graph: implement git-commit-graph write
  commit-graph: implement write_commit_graph()
  commit-graph: create git-commit-graph builtin
  graph: add commit graph design document
  commit-graph: add format document
  csum-file: refactor finalize_hashfile() method
  csum-file: rename hashclose() to finalize_hashfile()
2018-05-08 15:59:20 +09:00
Johannes Schindelin 033abf97fc Replace all die("BUG: ...") calls by BUG() ones
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).

The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.

Let's just convert all remaining ones in one fell swoop.

This trick was performed by this invocation:

	sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-06 19:06:13 +09:00
brian m. carlson 5d9e198245 index-pack: abstract away hash function constant
The code for reading certain pack v2 offsets had a hard-coded 5
representing the number of uint32_t words that we needed to skip over.
Specify this value in terms of a value from the_hash_algo.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:51 +09:00
Stefan Beller 0df8e96566 cache.h: add repository argument to oid_object_info
Add a repository argument to allow the callers of oid_object_info
to be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Stefan Beller <sbeller@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-26 10:54:27 +09:00
Stefan Beller fc1395f4a4 sha1_file.c: rename to use dash in file name
This is more consistent with the project style. The majority of Git's
source files use dashes in preference to underscores in their file names.

Signed-off-by: Stefan Beller <sbeller@google.com>
2018-04-11 18:11:00 +09:00
Stefan Beller d807c4a01d exec_cmd: rename to use dash in file name
This is more consistent with the project style. The majority of Git's
source files use dashes in preference to underscores in their file names.

Signed-off-by: Stefan Beller <sbeller@google.com>
2018-04-11 18:11:00 +09:00
Junio C Hamano cf0b1793ea Merge branch 'sb/object-store'
Refactoring the internal global data structure to make it possible
to open multiple repositories, work with and then close them.

Rerolled by Duy on top of a separate preliminary clean-up topic.
The resulting structure of the topics looked very sensible.

* sb/object-store: (27 commits)
  sha1_file: allow sha1_loose_object_info to handle arbitrary repositories
  sha1_file: allow map_sha1_file to handle arbitrary repositories
  sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
  sha1_file: allow open_sha1_file to handle arbitrary repositories
  sha1_file: allow stat_sha1_file to handle arbitrary repositories
  sha1_file: allow sha1_file_name to handle arbitrary repositories
  sha1_file: add repository argument to sha1_loose_object_info
  sha1_file: add repository argument to map_sha1_file
  sha1_file: add repository argument to map_sha1_file_1
  sha1_file: add repository argument to open_sha1_file
  sha1_file: add repository argument to stat_sha1_file
  sha1_file: add repository argument to sha1_file_name
  sha1_file: allow prepare_alt_odb to handle arbitrary repositories
  sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
  sha1_file: add repository argument to prepare_alt_odb
  sha1_file: add repository argument to link_alt_odb_entries
  sha1_file: add repository argument to read_info_alternates
  sha1_file: add repository argument to link_alt_odb_entry
  sha1_file: add raw_object_store argument to alt_odb_usable
  pack: move approximate object count to object store
  ...
2018-04-11 13:09:55 +09:00
Junio C Hamano a5bbc29994 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (36 commits)
  convert: convert to struct object_id
  sha1_file: introduce a constant for max header length
  Convert lookup_replace_object to struct object_id
  sha1_file: convert read_sha1_file to struct object_id
  sha1_file: convert read_object_with_reference to object_id
  tree-walk: convert tree entry functions to object_id
  streaming: convert istream internals to struct object_id
  tree-walk: convert get_tree_entry_follow_symlinks internals to object_id
  builtin/notes: convert static functions to object_id
  builtin/fmt-merge-msg: convert remaining code to object_id
  sha1_file: convert sha1_object_info* to object_id
  Convert remaining callers of sha1_object_info_extended to object_id
  packfile: convert unpack_entry to struct object_id
  sha1_file: convert retry_bad_packed_offset to struct object_id
  sha1_file: convert assert_sha1_type to object_id
  builtin/mktree: convert to struct object_id
  streaming: convert open_istream to use struct object_id
  sha1_file: convert check_sha1_signature to struct object_id
  sha1_file: convert read_loose_object to use struct object_id
  builtin/index-pack: convert struct ref_delta_entry to object_id
  ...
2018-04-10 08:25:45 +09:00
Derrick Stolee f2af9f5e02 csum-file: rename hashclose() to finalize_hashfile()
The hashclose() method behaves very differently depending on the flags
parameter. In particular, the file descriptor is not always closed.

Perform a simple rename of "hashclose()" to "finalize_hashfile()" in
preparation for functional changes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-02 14:27:30 -07:00
Stefan Beller a80d72db2a object-store: move packed_git and packed_git_mru to object store
In a process with multiple repositories open, packfile accessors
should be associated to a single repository and not shared globally.
Move packed_git and packed_git_mru into the_repository and adjust
callers to reflect this.

[nd: while at there, wrap access to these two fields in get_packed_git()
and get_packed_git_mru(). This allows us to lazily initialize these
fields without caller doing that explicitly]

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:05:46 -07:00
Jonathan Tan ffb2c0fe5c index-pack: support checking objects but not links
The index-pack command currently supports the
--check-self-contained-and-connected argument, for internal use only,
that instructs it to only check for broken links and not broken objects.
For partial clones, we need the inverse, so add a --fsck-objects
argument that checks for broken objects and not broken links, also for
internal use only.

This will be used by fetch-pack in a subsequent patch.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 10:16:24 -07:00
Junio C Hamano 99321e327b Merge branch 'nd/object-allocation-comments'
Code doc update.

* nd/object-allocation-comments:
  object.h: realign object flag allocation comment
  object.h: update flag allocation comment
2018-03-14 12:01:06 -07:00
brian m. carlson b4f5aca40e sha1_file: convert read_sha1_file to struct object_id
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file.  Do the same for read_sha1_file_extended.

Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id.  Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
brian m. carlson abef9020e3 sha1_file: convert sha1_object_info* to object_id
Convert sha1_object_info and sha1_object_info_extended to take pointers
to struct object_id and rename them to use "oid" instead of "sha1" in
their names.  Update the declaration and definition and apply the
following semantic patch, plus the standard object_id transforms:

@@
expression E1, E2;
@@
- sha1_object_info(E1.hash, E2)
+ oid_object_info(&E1, E2)

@@
expression E1, E2;
@@
- sha1_object_info(E1->hash, E2)
+ oid_object_info(E1, E2)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1.hash, E2, E3)
+ oid_object_info_extended(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- sha1_object_info_extended(E1->hash, E2, E3)
+ oid_object_info_extended(E1, E2, E3)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson ef7b5195f1 streaming: convert open_istream to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson 17e65451e3 sha1_file: convert check_sha1_signature to struct object_id
Convert this function to take a pointer to struct object_id and rename
it check_object_signature.  Introduce temporaries to convert the return
values of lookup_replace_object and lookup_replace_object_extended into
struct object_id.

The temporaries are needed because in order to convert
lookup_replace_object, open_istream needs to be converted, and
open_istream needs check_sha1_signature to be converted, causing a loop
of dependencies.  The temporaries will be removed in a future patch.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:49 -07:00
brian m. carlson af8caf33d5 builtin/index-pack: convert struct ref_delta_entry to object_id
Convert this struct to use a member of type object_id.  Convert various
static functions as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:48 -07:00
Junio C Hamano 169c9c0169 Merge branch 'bw/c-plus-plus'
Avoid using identifiers that clash with C++ keywords.  Even though
it is not a goal to compile Git with C++ compilers, changes like
this help use of code analysis tools that targets C++ on our
codebase.

* bw/c-plus-plus: (37 commits)
  replace: rename 'new' variables
  trailer: rename 'template' variables
  tempfile: rename 'template' variables
  wrapper: rename 'template' variables
  environment: rename 'namespace' variables
  diff: rename 'template' variables
  environment: rename 'template' variables
  init-db: rename 'template' variables
  unpack-trees: rename 'new' variables
  trailer: rename 'new' variables
  submodule: rename 'new' variables
  split-index: rename 'new' variables
  remote: rename 'new' variables
  ref-filter: rename 'new' variables
  read-cache: rename 'new' variables
  line-log: rename 'new' variables
  imap-send: rename 'new' variables
  http: rename 'new' variables
  entry: rename 'new' variables
  diffcore-delta: rename 'new' variables
  ...
2018-03-06 14:54:07 -08:00
Nguyễn Thái Ngọc Duy 95308d64ce object.h: update flag allocation comment
Since the "flags" is shared, it's a good idea to keep track of who
uses what bit. When we need to use more flags in library code, we can
be sure it won't be re-used for another purpose by some caller.

While at there, fix the location of "5" (should be in a different
column than "4" two lines down)

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-06 11:41:21 -08:00
Junio C Hamano 0fd90daba8 Merge branch 'bc/hash-algo'
More abstraction of hash function from the codepath.

* bc/hash-algo:
  hash: update obsolete reference to SHA1_HEADER
  bulk-checkin: abstract SHA-1 usage
  csum-file: abstract uses of SHA-1
  csum-file: rename sha1file to hashfile
  read-cache: abstract away uses of SHA-1
  pack-write: switch various SHA-1 values to abstract forms
  pack-check: convert various uses of SHA-1 to abstract forms
  fast-import: switch various uses of SHA-1 to the_hash_algo
  sha1_file: switch uses of SHA-1 to the_hash_algo
  builtin/unpack-objects: switch uses of SHA-1 to the_hash_algo
  builtin/index-pack: improve hash function abstraction
  hash: create union for hash context allocation
  hash: move SHA-1 macros to hash.h
2018-02-15 14:55:47 -08:00
Junio C Hamano 8be8342b4c Merge branch 'po/object-id'
Conversion from uchar[20] to struct object_id continues.

* po/object-id:
  sha1_file: rename hash_sha1_file_literally
  sha1_file: convert write_loose_object to object_id
  sha1_file: convert force_object_loose to object_id
  sha1_file: convert write_sha1_file to object_id
  notes: convert write_notes_tree to object_id
  notes: convert combine_notes_* to object_id
  commit: convert commit_tree* to object_id
  match-trees: convert splice_tree to object_id
  cache: clear whole hash buffer with oidclr
  sha1_file: convert hash_sha1_file to object_id
  dir: convert struct sha1_stat to use object_id
  sha1_file: convert pretend_sha1_file to object_id
2018-02-15 14:55:43 -08:00
Brandon Williams debca9d2fe object: rename function 'typename' to 'type_name'
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-14 13:10:05 -08:00
Junio C Hamano f3d618d2bf Merge branch 'jh/fsck-promisors'
In preparation for implementing narrow/partial clone, the machinery
for checking object connectivity used by gc and fsck has been
taught that a missing object is OK when it is referenced by a
packfile specially marked as coming from trusted repository that
promises to make them available on-demand and lazily.

* jh/fsck-promisors:
  gc: do not repack promisor packfiles
  rev-list: support termination at promisor objects
  sha1_file: support lazily fetching missing objects
  introduce fetch-object: fetch one promisor object
  index-pack: refactor writing of .keep files
  fsck: support promisor objects as CLI argument
  fsck: support referenced promisor objects
  fsck: support refs pointing to promisor objects
  fsck: introduce partialclone extension
  extension.partialclone: introduce partial clone extension
2018-02-13 13:39:03 -08:00
brian m. carlson 98a3beab6a csum-file: rename sha1file to hashfile
Rename struct sha1file to struct hashfile, along with all of its related
functions.

The transformation in this commit was made by global search-and-replace.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:28:41 -08:00
brian m. carlson 454253f059 builtin/index-pack: improve hash function abstraction
Convert several uses of unsigned char [20] to struct object_id and
convert various hard-coded constants and uses of SHA-1 functions to use
the_hash_algo.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-02 11:28:41 -08:00
Patryk Obara f070faccc1 sha1_file: convert hash_sha1_file to object_id
Convert the declaration and definition of hash_sha1_file to use
struct object_id and adjust all function calls.

Rename this function to hash_object_file.

Signed-off-by: Patryk Obara <patryk.obara@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-30 10:42:36 -08:00
Christian Couder 72885a6d51 index-pack: use skip_to_optional_arg()
Let's simplify index-pack option parsing using
skip_to_optional_arg().

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-11 16:10:12 -08:00
Jonathan Tan 8b4c0103a9 sha1_file: support lazily fetching missing objects
Teach sha1_file to fetch objects from the remote configured in
extensions.partialclone whenever an object is requested but missing.

The fetching of objects can be suppressed through a global variable.
This is used by fsck and index-pack.

However, by default, such fetching is not suppressed. This is meant as a
temporary measure to ensure that all Git commands work in such a
situation. Future patches will update some commands to either tolerate
missing objects (without fetching them) or be more efficient in fetching
them.

In order to determine the code changes in sha1_file.c necessary, I
investigated the following:
 (1) functions in sha1_file.c that take in a hash, without the user
     regarding how the object is stored (loose or packed)
 (2) functions in packfile.c (because I need to check callers that know
     about the loose/packed distinction and operate on both differently,
     and ensure that they can handle the concept of objects that are
     neither loose nor packed)

(1) is handled by the modification to sha1_object_info_extended().

For (2), I looked at for_each_packed_object and others.  For
for_each_packed_object, the callers either already work or are fixed in
this patch:
 - reachable - only to find recent objects
 - builtin/fsck - already knows about missing objects
 - builtin/cat-file - warning message added in this commit

Callers of the other functions do not need to be changed:
 - parse_pack_index
   - http - indirectly from http_get_info_packs
   - find_pack_entry_one
     - this searches a single pack that is provided as an argument; the
       caller already knows (through other means) that the sought object
       is in a specific pack
 - find_sha1_pack
   - fast-import - appears to be an optimization to not store a file if
     it is already in a pack
   - http-walker - to search through a struct alt_base
   - http-push - to search through remote packs
 - has_sha1_pack
   - builtin/fsck - already knows about promisor objects
   - builtin/count-objects - informational purposes only (check if loose
     object is also packed)
   - builtin/prune-packed - check if object to be pruned is packed (if
     not, don't prune it)
   - revision - used to exclude packed objects if requested by user
   - diff - just for optimization

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:52:42 -08:00
Jonathan Tan 88e2f9ed8e introduce fetch-object: fetch one promisor object
Introduce fetch-object, providing the ability to fetch one object from a
promisor remote.

This uses fetch-pack. To do this, the transport mechanism has been
updated with 2 flags, "from-promisor" to indicate that the resulting
pack comes from a promisor remote (and thus should be annotated as such
by index-pack), and "no-dependents" to indicate that only the objects
themselves need to be fetched (but fetching additional objects is
nevertheless safe).

Whenever "no-dependents" is used, fetch-pack will refrain from using any
object flags, because it is most likely invoked as part of a dynamic
object fetch by another Git command (which may itself use object flags).
An alternative to this is to leave fetch-pack alone, and instead update
the allocation of flags so that fetch-pack's flags never overlap with
any others, but this will end up shrinking the number of flags available
to nearly every other Git command (that is, every Git command that
accesses objects), so the approach in this commit was used instead.

This will be tested in a subsequent commit.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Jonathan Tan 8e29c7c3af index-pack: refactor writing of .keep files
In a subsequent commit, index-pack will be taught to write ".promisor"
files which are similar to the ".keep" files it knows how to write.
Refactor the writing of ".keep" files, so that the implementation of
writing ".promisor" files becomes easier.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:46:05 -08:00
Derrick Stolee 19716b21a4 cleanup: fix possible overflow errors in binary search
A common mistake when writing binary search is to allow possible
integer overflow by using the simple average:

	mid = (min + max) / 2;

Instead, use the overflow-safe version:

	mid = min + (max - min) / 2;

This translation is safe since the operation occurs inside a loop
conditioned on "min < max". The included changes were found using
the following git grep:

	git grep '/ *2;' '*.c'

Making this cleanup will prevent future review friction when a new
binary search is contructed based on existing code.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-10 08:57:24 +09:00
Jonathan Tan 4f39cd821d pack: move pack name-related functions
Currently, sha1_file.c and cache.h contain many functions, both related
to and unrelated to packfiles. This makes both files very large and
causes an unclear separation of concerns.

Create a new file, packfile.c, to hold all packfile-related functions
currently in sha1_file.c. It has a corresponding header packfile.h.

In this commit, the pack name-related functions are moved. Subsequent
commits will move the other functions.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:06 -07:00
Junio C Hamano 00b7cf2379 Merge branch 'jt/unify-object-info'
Code clean-ups.

* jt/unify-object-info:
  sha1_file: refactor has_sha1_file_with_flags
  sha1_file: do not access pack if unneeded
  sha1_file: teach sha1_object_info_extended more flags
  sha1_file: refactor read_object
  sha1_file: move delta base cache code up
  sha1_file: rename LOOKUP_REPLACE_OBJECT
  sha1_file: rename LOOKUP_UNKNOWN_OBJECT
  sha1_file: teach packed_object_info about typename
2017-07-05 13:32:57 -07:00
Jonathan Tan e83e71c5e1 sha1_file: refactor has_sha1_file_with_flags
has_sha1_file_with_flags() implements many mechanisms in common with
sha1_object_info_extended(). Make has_sha1_file_with_flags() a
convenience function for sha1_object_info_extended() instead.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-26 10:28:58 -07:00
Junio C Hamano 50f03c6676 Merge branch 'ab/free-and-null'
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.

* ab/free-and-null:
  *.[ch] refactoring: make use of the FREE_AND_NULL() macro
  coccinelle: make use of the "expression" FREE_AND_NULL() rule
  coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
  coccinelle: make use of the "type" FREE_AND_NULL() rule
  coccinelle: add a rule to make "type" code use FREE_AND_NULL()
  git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24 14:28:41 -07:00
Junio C Hamano f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00
Ævar Arnfjörð Bjarmason 6a83d90207 coccinelle: make use of the "type" FREE_AND_NULL() rule
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-16 12:44:03 -07:00
Brandon Williams b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
brian m. carlson c251c83df2 object: convert parse_object* to take struct object_id
Make parse_object, parse_object_or_die, and parse_object_buffer take a
pointer to struct object_id.  Remove the temporary variables inserted
earlier, since they are no longer necessary.  Transform all of the
callers using the following semantic patch:

@@
expression E1;
@@
- parse_object(E1.hash)
+ parse_object(&E1)

@@
expression E1;
@@
- parse_object(E1->hash)
+ parse_object(E1)

@@
expression E1, E2;
@@
- parse_object_or_die(E1.hash, E2)
+ parse_object_or_die(&E1, E2)

@@
expression E1, E2;
@@
- parse_object_or_die(E1->hash, E2)
+ parse_object_or_die(E1, E2)

@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1.hash, E2, E3, E4, E5)
+ parse_object_buffer(&E1, E2, E3, E4, E5)

@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1->hash, E2, E3, E4, E5)
+ parse_object_buffer(E1, E2, E3, E4, E5)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:58 +09:00
brian m. carlson 3aca1fc6c9 Convert lookup_blob to struct object_id
Convert lookup_blob to take a pointer to struct object_id.

The commit was created with manual changes to blob.c and blob.h, plus
the following semantic patch:

@@
expression E1;
@@
- lookup_blob(E1.hash)
+ lookup_blob(&E1)

@@
expression E1;
@@
- lookup_blob(E1->hash)
+ lookup_blob(E1)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
brian m. carlson 3e9309815d Convert remaining callers of lookup_blob to object_id
All but a few callers of lookup_blob have been converted to struct
object_id.  Introduce a temporary, which will be removed later, into
parse_object to ease the transition, and convert the remaining callers
so that we can update lookup_blob to take struct object_id *.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
brian m. carlson e6a492b7be pack: convert struct pack_idx_entry to struct object_id
Convert struct pack_idx_entry to use struct object_id by changing the
definition and applying the following semantic patch, plus the standard
object_id transforms:

@@
struct pack_idx_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct pack_idx_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
Junio C Hamano dfe46c5ce6 Merge branch 'jk/loose-object-info-report-error'
Update error handling for codepath that deals with corrupt loose
objects.

* jk/loose-object-info-report-error:
  index-pack: detect local corruption in collision check
  sha1_loose_object_info: return error for corrupted objects
2017-04-16 23:29:30 -07:00
Jeff King 51054177b3 index-pack: detect local corruption in collision check
When we notice that we have a local copy of an incoming
object, we compare the two objects to make sure we haven't
found a collision. Before we get to the actual object
bytes, though, we compare the type and size from
sha1_object_info().

If our local object is corrupted, then the type will be
OBJ_BAD, which obviously will not match the incoming type,
and we'll report "SHA1 COLLISION FOUND" (with capital
letters and everything). This is confusing, as the problem
is not a collision but rather local corruption. We should
report that instead (just like we do if reading the rest of
the object content fails a few lines later).

Note that we _could_ just ignore the error and mark it as a
non-collision. That would let you "git fetch" to replace a
corrupted object. But it's not a very reliable method for
repairing a repository. The earlier want/have negotiation
tries to get the other side to omit objects we already have,
and it would not realize that we are "missing" this
corrupted object. So we're better off complaining loudly
when we see corruption, and letting the user take more
drastic measures to repair (like making a full clone
elsewhere and copying the pack into place).

Note that the test sets transfer.unpackLimit in the
receiving repository so that we use index-pack (which is
what does the collision check). Normally for such a small
push we'd use unpack-objects, which would simply try to
write the loose object, and discard the new one when we see
that there's already an old one.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-01 10:48:11 -07:00
Jeff King 5b1ef2cef4 replace unchecked snprintf calls with heap buffers
We'd prefer to avoid unchecked snprintf calls because
truncation can lead to unexpected results.

These are all cases where truncation shouldn't ever happen,
because the input to snprintf is fixed in size. That makes
them candidates for xsnprintf(), but it's simpler still to
just use the heap, and then nobody has to wonder if "100" is
big enough.

We'll use xstrfmt() where possible, and a strbuf when we need
the resulting size or to reuse the same buffer in a loop.

Signed-off-by: Jeff King <peff@peff.net>
2017-03-30 14:59:50 -07:00
Jeff King 594fa9998c odb_mkstemp: write filename into strbuf
The odb_mkstemp() function expects the caller to provide a
fixed buffer to write the resulting tempfile name into. But
it creates the template using snprintf without checking the
return value. This means we could silently truncate the
filename.

In practice, it's unlikely that the truncation would end in
the template-pattern that mkstemp needs to open the file. So
we'd probably end up failing either way, unless the path was
specially crafted.

The simplest fix would be to notice the truncation and die.
However, we can observe that most callers immediately
xstrdup() the result anyway. So instead, let's switch to
using a strbuf, which is easier for them (and isn't a big
deal for the other 2 callers, who can just strbuf_release
when they're done with it).

Note that many of the callers used static buffers, but this
was purely to avoid putting a large buffer on the stack. We
never passed the static buffers out of the function, so
there's no complicated memory handling we need to change.

Signed-off-by: Jeff King <peff@peff.net>
2017-03-28 15:28:04 -07:00
Jeff King 892e723afd do not check odb_mkstemp return value for errors
The odb_mkstemp function does not return an error; it dies
on failure instead. But many of its callers compare the
resulting descriptor against -1 and die themselves.

Mostly this is just pointless, but it does raise a question
when looking at the callers: if they show the results of the
"template" buffer after a failure, what's in it? The answer
is: it doesn't matter, because it cannot happen.

So let's make that clear by removing the bogus error checks.
In bitmap_writer_finish(), we can drop the error-handling
code entirely. In the other two cases, it's shared with the
open() in another code path; we can just move the
error-check next to that open() call.

And while we're at it, let's flesh out the function's
docstring a bit to make the error behavior clear.

Signed-off-by: Jeff King <peff@peff.net>
2017-03-28 15:28:04 -07:00
Jeff King f20754802a index-pack: make pointer-alias fallbacks safer
The final() function accepts a NULL value for certain
parameters, and falls back to writing into a reusable "name"
buffer, and then either:

  1. For "keep_name", requiring all uses to do "keep_name ?
     keep_name : name.buf". This is awkward, and it's easy
     to accidentally look at the maybe-NULL keep_name.

  2. For "final_index_name" and "final_pack_name", aliasing
     those pointers to the "name" buffer. This is easier to
     use, but the aliased pointers become invalid after the
     buffer is reused (this isn't a bug now, but it's a
     potential pitfall).

One way to make this safer would be to introduce an extra
pointer to do the aliasing, and have its lifetime match the
validity of the "name" buffer. But it's still easy to
accidentally use the wrong name (i.e., to use
"final_pack_name" instead of the aliased pointer).

Instead, let's use three separate buffers that will remain
valid through the function. That makes it safe to alias the
pointers and use them consistently. The extra allocations
shouldn't matter, as this function is not performance
sensitive.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 11:33:43 -07:00
Jeff King ba47a3088f replace snprintf with odb_pack_name()
In several places we write the name of the pack filename
into a fixed-size buffer using snprintf(), but do not check
the return value.  As a result, a very long object directory
could cause us to quietly truncate the pack filename
(potentially leading to a corrupted repository, as a newly
written packfile could be missing its .pack extension).

We can use odb_pack_name() to do this with a strbuf (and
shorten the code, as well).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 11:26:18 -07:00
Jeff King eaeefc3276 odb_pack_keep(): stop generating keepfile name
The odb_pack_keep() function generates the name of a .keep
file and opens it. This has two problems:

  1. It requires a fixed-size buffer to create the filename
     and doesn't notice when the result is truncated.

  2. Of the two callers, one sometimes wants to open a
     filename it already has, which makes things awkward (it
     has to do so manually, and skips the leading-directory
     creation).

Instead, let's have odb_pack_keep() just open the file.
Generating the name isn't hard, and a future patch will
switch callers over to odb_pack_name() anyway.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-16 11:17:00 -07:00
Jeff King 29401e1575 index-pack: skip collision check when not in repository
You can run "git index-pack path/to/foo.pack" outside of a
repository to generate an index file, or just to verify the
contents. There's no point in doing a collision check, since
we obviously do not have any objects to collide with.

The current code will blindly look in .git/objects based on
the result of setup_git_env(). That effectively gives us the
right answer (since we won't find any objects), but it's a
waste of time, and it conflicts with our desire to
eventually get rid of the "fallback to .git" behavior of
setup_git_env().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-16 13:57:19 -08:00
Jeff King 7176a31444 index-pack: complain when --stdin is used outside of a repo
The index-pack builtin is marked as RUN_SETUP_GENTLY,
because it's perfectly fine to index a pack in the
filesystem outside of any repository. However, --stdin mode
will write the result to the object database, which does not
make sense outside of a repository. Doing so creates a bogus
".git" directory with nothing in it except the newly-created
pack and its index.

Instead, let's flag this as an error and abort.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-16 09:29:43 -08:00
René Scharfe 1b5294de40 use QSORT, part 2
Convert two more qsort(3) calls to QSORT to reduce code size and for
better safety and consistency.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-29 20:40:23 -07:00
René Scharfe 9ed0d8d6e6 use QSORT
Apply the semantic patch contrib/coccinelle/qsort.cocci to the code
base, replacing calls of qsort(3) with QSORT.  The resulting code is
shorter and supports empty arrays with NULL pointers.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-29 15:42:18 -07:00
Jeff King 411481be6f index-pack: add --max-input-size=<size> option
When receiving a pack-file, it can be useful to abort the
`git index-pack`, if the pack-file is too big.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-24 12:31:05 -07:00
Junio C Hamano a58a8e3f71 Merge branch 'jk/push-progress'
"git push" and "git clone" learned to give better progress meters
to the end user who is waiting on the terminal.

* jk/push-progress:
  receive-pack: send keepalives during quiet periods
  receive-pack: turn on connectivity progress
  receive-pack: relay connectivity errors to sideband
  receive-pack: turn on index-pack resolving progress
  index-pack: add flag for showing delta-resolution progress
  clone: use a real progress meter for connectivity check
  check_connected: add progress flag
  check_connected: relay errors to alternate descriptor
  check_everything_connected: use a struct with named options
  check_everything_connected: convert to argv_array
  rev-list: add optional progress reporting
  check_everything_connected: always pass --quiet to rev-list
2016-08-03 15:10:28 -07:00
Jeff King 83558686ce receive-pack: send keepalives during quiet periods
After a client has sent us the complete pack, we may spend
some time processing the data and running hooks. If the
client asked us to be quiet, receive-pack won't send any
progress data during the index-pack or connectivity-check
steps. And hooks may or may not produce their own progress
output. In these cases, the network connection is totally
silent from both ends.

Git itself doesn't care about this (it will wait forever),
but other parts of the system (e.g., firewalls,
load-balancers, etc) might hang up the connection. So we'd
like to send some sort of keepalive to let the network and
the client side know that we're still alive and processing.

We can use the same trick we did in 05e9515 (upload-pack:
send keepalive packets during pack computation, 2013-09-08).
Namely, we will send an empty sideband data packet every `N`
seconds that we do not relay any stderr data over the
sideband channel. As with 05e9515, this means that we won't
bother sending keepalives when there's actual progress data,
but will kick in when it has been disabled (or if there is a
lull in the progress data).

The concept is simple, but the details are subtle enough
that they need discussing here.

Before the client sends us the pack, we don't want to do any
keepalives. We'll have sent our ref advertisement, and we're
waiting for them to send us the pack (and tell us that they
support sidebands at all).

While we're receiving the pack from the client (or waiting
for it to start), there's no need for keepalives; it's up to
them to keep the connection active by sending data.
Moreover, it would be wrong for us to do so. When we are the
server in the smart-http protocol, we must treat our
connection as half-duplex. So any keepalives we send while
receiving the pack would potentially be buffered by the
webserver. Not only does this make them useless (since they
would not be delivered in a timely manner), but it could
actually cause a deadlock if we fill up the buffer with
keepalives. (It wouldn't be wrong to send keepalives in this
phase for a full-duplex connection like ssh; it's simply
pointless, as it is the client's responsibility to speak).

As soon as we've gotten all of the pack data, then the
client is waiting for us to speak, and we should start
keepalives immediately. From here until the end of the
connection, we send one any time we are not otherwise
sending data.

But there's a catch. Receive-pack doesn't know the moment
we've gotten all the data. It passes the descriptor to
index-pack, who reads all of the data, and then starts
resolving the deltas. We have to communicate that back.

To make this work, we instruct the sideband muxer to enable
keepalives in three phases:

  1. In the beginning, not at all.

  2. While reading from index-pack, wait for a signal
     indicating end-of-input, and then start them.

  3. Afterwards, always.

The signal from index-pack in phase 2 has to come over the
stderr channel which the muxer is reading. We can't use an
extra pipe because the portable run-command interface only
gives us stderr and stdout.

Stdout is already used to pass the .keep filename back to
receive-pack. We could also send a signal there, but then we
would find out about it in the main thread. And the
keepalive needs to be done by the async muxer thread (since
it's the one writing sideband data back to the client). And
we can't reliably signal the async thread from the main
thread, because the async code sometimes uses threads and
sometimes uses forked processes.

Therefore the signal must come over the stderr channel,
where it may be interspersed with other random
human-readable messages from index-pack. This patch makes
the signal a single NUL byte.  This is easy to parse, should
not appear in any normal stderr output, and we don't have to
worry about any timing issues (like seeing half the signal
bytes in one read(), and half in a subsequent one).

This is a bit ugly, but it's simple to code and should work
reliably.

Another option would be to stop using an async thread for
muxing entirely, and just poll() both stderr and stdout of
index-pack from the main thread. This would work for
index-pack (because we aren't doing anything useful in the
main thread while it runs anyway). But it would make the
connectivity check and the hook muxers much more
complicated, as they need to simultaneously feed the
sub-programs while reading their stderr.

The index-pack phase is the only one that needs this
signaling, so it could simply behave differently than the
other two. That would mean having two separate
implementations of copy_to_sideband (and the keepalive
code), though. And it still doesn't get rid of the
signaling; it just means we can write a nicer message like
"END_OF_INPUT" or something on stdout, since we don't have
to worry about separating it from the stderr cruft.

One final note: this signaling trick is only done with
index-pack, not with unpack-objects. There's no point in
doing it for the latter, because by definition it only kicks
in for a small number of objects, where keepalives are not
as useful (and this conveniently lets us avoid duplicating
the implementation).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-20 12:11:11 -07:00