1
0
mirror of https://github.com/git/git.git synced 2024-09-23 19:01:14 +02:00
Commit Graph

53018 Commits

Author SHA1 Message Date
Jean-Noël Avila
27c929edd6 i18n: fix mistakes in translated strings
Fix typos and convert a question which does not expect to be replied
to a simple advice.

Signed-off-by: Jean-Noël Avila <jn.avila@free.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 14:29:12 -07:00
Derrick Stolee
d915114a3b config: fix commit-graph related config docs
The core.commitGraph config setting was accidentally removed from
the config documentation. In that same patch, the config setting
that writes a commit-graph during garbage collection was incorrectly
written to the doc as "gc.commitGraph" instead of "gc.writeCommitGraph".

Reported-by: Szeder Gábor <szeder.dev@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 10:20:54 -07:00
Jeff King
66e83d9b41 append_signoff: use size_t for string offsets
The append_signoff() function takes an "int" to specify the
number of bytes to ignore. Most callers just pass 0, and the
remainder use ignore_non_trailer() to skip over cruft.
That function also returns an int, and uses them internally.

On systems where size_t is larger than an int (i.e., most
64-bit systems), dealing with a ridiculously large commit
message could end up overflowing an int, producing
surprising results (e.g., returning a negative offset, which
would cause us to look outside the original string).

Let's consistently use size_t for these offsets through this
whole stack. As a bonus, this makes the meaning of
"ignore_footer" as an offset (and not a boolean) more clear.
But while we're here, let's also document the interface.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 10:08:51 -07:00
Jeff King
ffce7f590f sequencer: ignore "---" divider when parsing trailers
When the sequencer code appends a signoff or cherry-pick
origin, it uses the default trailer-parsing options, which
treat "---" as the end of the commit message. As a result,
it may be fooled by a commit message that contains that
string and fail to find the existing trailer block. Even
more confusing, the actual append code does not know about
"---", and always appends to the end of the string. This can
lead to bizarre results. E.g., appending a signoff to a
commit message like this:

  subject

  body
  ---
  these dashes confuse the parser!

  Signed-off-by: A

results in output with a final block like:

  Signed-off-by: A

  Signed-off-by: A

The parser thinks the final line of the message is "body",
and ignores everything else, claiming there are no trailers.
So we output an extra newline separator (wrong) and add a
duplicate signoff (also wrong).

Since we know we are feeding a pure commit message, we can
simply tell the parser to ignore the "---" divider.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 10:08:51 -07:00
Jeff King
e5fba5d558 pretty, ref-filter: format %(trailers) with no_divider option
In both of these cases we know that we are feeding the
trailer-parsing code a pure commit message. We should tell
it so, which avoids false positives for a commit message
that contains a "---" line.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 10:08:51 -07:00
Jeff King
1688c9a489 interpret-trailers: allow suppressing "---" divider
Even with the newly-tightened "---" parser, it's still
possible for a commit message to trigger a false positive if
it contains something like "--- foo". If the caller knows
that it has only a single commit message, it can now tell us
with the "--no-divider" option, eliminating any false
positives.

If we were designing this from scratch, I'd probably make
this the default. But we've advertised the "---" behavior in
the documentation since interpret-trailers has existed.
Since it's meant to be scripted, breaking that would be a
bad idea.

Note that the logic is in the underlying trailer.c code,
which is used elsewhere. The default there will keep the
current behavior, but many callers will benefit from setting
this new option. That's left for future patches.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 10:08:51 -07:00
Jeff King
c188668e38 interpret-trailers: tighten check for "---" patch boundary
The interpret-trailers command accepts not only raw commit
messages, but it also can manipulate trailers in
format-patch output. That means it must find the "---"
boundary separating the commit message from the patch.
However, it does so by looking for any line starting with
"---", regardless of whether there is further content.

This is overly lax compared to the parsing done in
mailinfo.c's patchbreak(), and may cause false positives
(e.g., t/perf output tables uses dashes; if you cut and
paste them into your commit message, it fools the parser).

We could try to reuse patchbreak() here, but it actually has
several heuristics that are not of interest to us (e.g.,
matching "diff -" without a three-dash separator or even a
CVS "Index:" line). We're not interested in taking in
whatever random cruft people may send, but rather handling
git-formatted patches.

Note that the existing documentation was written in a loose
way, so technically we are changing the behavior from what
it said. But this should implement the original intent in a
more accurate way.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 10:08:51 -07:00
Jeff King
00a21f5cbd trailer: pass process_trailer_opts to trailer_info_get()
Most of the trailer code has an "opts" struct which is
filled in by the caller. We don't pass it down to
trailer_info_get(), which does the initial parsing, because
there hasn't yet been a need to do so.

Let's start passing it down in preparation for adding new
options. Note that there's a single caller which doesn't
otherwise have such an options struct. Since it's just one
caller (that we'd have to modify anyway), let's not bother
with any special treatment like accepting a NULL options
struct, and just have it allocate one with the defaults.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 10:08:51 -07:00
Jeff King
a3b636e215 trailer: use size_t for iterating trailer list
We store the length of the trailers list in a size_t. So on
a 64-bit system with a 32-bit int, in the unlikely case that
we manage to actually allocate a list with 2^31 entries,
we'd loop forever trying to iterate over it (our "int" would
wrap to negative before exceeding info->trailer_nr).

This probably doesn't matter in practice. Each entry is at
least a pointer plus a non-empty string, so even without
malloc overhead or the memory to hold the original string
we're parsing from, you'd need to allocate tens of
gigabytes. But it's easy enough to do it right.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 10:08:51 -07:00
Jeff King
0d2db00e24 trailer: use size_t for string offsets
Many of the string-parsing functions inside trailer.c return
integer offsets into the string (e.g., to point to the end
of the trailer block). Several of these use an "int" to
return or store the offsets. On a system where "size_t" is
much larger than "int" (e.g., most 64-bit ones), it's easy
to feed a gigantic commit message that results in a negative
offset. This can result in us reading memory before the
string (if the int is used as an index) or far after (if
it's implicitly cast to a size_t by passing to a strbuf
function).

Let's fix this by using size_t for all string offsets. Note
that several of the functions need ssize_t, since they use
"-1" as a sentinel value. The interactions here can be
pretty subtle. E.g., end_of_title in find_trailer_start()
does not itself need to be signed, but it is compared to the
result of last_line(), which is. That promotes the latter to
unsigned, and the ">=" does not behave as you might expect.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 10:08:51 -07:00
SZEDER Gábor
7afb0d6777 t/lib-rebase.sh: support explicit 'pick' commands in 'fake_editor.sh'
The verbose output of the test 'reword without issues functions as
intended' in 't3423-rebase-reword.sh', added in a9279c6785 (sequencer:
do not squash 'reword' commits when we hit conflicts, 2018-06-19),
contains the following error output:

  sed: -e expression #1, char 2: extra characters after command

This error comes from within the 'fake-editor.sh' script created by
'lib-rebase.sh's set_fake_editor() function, and the root cause is the
FAKE_LINES="pick 1 reword 2" variable in the test in question, in
particular the "pick" word.  'fake-editor.sh' assumes 'pick' to be the
default rebase command and doesn't support an explicit 'pick' command
in FAKE_LINES.  As a result, 'pick' will be used instead of a line
number when assembling the following 'sed' script:

  sed -n picks/^pick/pick/p

which triggers the aforementioned error.

Luckily, this didn't affect the test's correctness: the erroring 'sed'
command doesn't write anything to the todo script, and processing the
rest of FAKE_LINES generates the desired todo script, as if that
'pick' command were not there at all.

The minimal fix would be to remove the 'pick' word from FAKE_LINES,
but that would leave us susceptible to similar issues in the future.

Instead, teach the fake-editor script to recognize an explicit 'pick'
command, which is still a fairly trivial change.

In the future we might want to consider reinforcing this fake editor
script with an &&-chain and stricter parsing of the FAKE_LINES
variable (e.g. to error out when encountering unknown rebase commands
or commands and line numbers in the wrong order).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 09:23:34 -07:00
Jeff King
183a638b7d hashcmp: assert constant hash size
Prior to 509f6f62a4 (cache: update object ID functions for
the_hash_algo, 2018-07-16), hashcmp() called memcmp() with a
constant size of 20 bytes. Some compilers were able to turn
that into a few quad-word comparisons, which is faster than
actually calling memcmp().

In 509f6f62a4, we started using the_hash_algo->rawsz
instead. Even though this will always be 20, the compiler
doesn't know that while inlining hashcmp() and ends up just
generating a call to memcmp().

Eventually we'll have to deal with multiple hash sizes, but
for the upcoming v2.19, we can restore some of the original
performance by asserting on the size. That gives the
compiler enough information to know that the memcmp will
always be called with a length of 20, and it performs the
same optimization.

Here are numbers for p0001.2 run against linux.git on a few
versions. This is using -O2 with gcc 8.2.0.

  Test     v2.18.0             v2.19.0-rc0               HEAD
  ------------------------------------------------------------------------------
  0001.2:  34.24(33.81+0.43)   34.83(34.42+0.40) +1.7%   33.90(33.47+0.42) -1.0%

You can see that v2.19 is a little slower than v2.18. This
commit ended up slightly faster than v2.18, but there's a
fair bit of run-to-run noise (the generated code in the two
cases is basically the same). This patch does seem to be
consistently 1-2% faster than v2.19.

I tried changing hashcpy(), which was also touched by
509f6f62a4, in the same way, but couldn't measure any
speedup. Which makes sense, at least for this workload. A
traversal of the whole commit graph requires looking up
every entry of every tree via lookup_object(). That's many
multiples of the numbers of objects in the repository (most
of the lookups just return "yes, we already saw that
object").

[jn: verified using "make object.s" that the memcmp call goes away.]

Reported-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Jeff King <peff@peff.net
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-23 06:20:58 -07:00
Jeff King
a12cbe23ef rev-list: make empty --stdin not an error
When we originally did the series that contains 7ba826290a
(revision: add rev_input_given flag, 2017-08-02) the intent
was that "git rev-list --stdin </dev/null" would similarly
become a successful noop. However, an attempt at the time to
do that did not work[1]. The problem is that rev_input_given
serves two roles:

 - it tells rev-list.c that it should not error out

 - it tells revision.c that it should not have the "default"
   ref kick (e.g., "HEAD" in "git log")

We want to trigger the former, but not the latter. This is
technically possible with a single flag, if we set the flag
only after revision.c's revs->def check. But this introduces
a rather subtle ordering dependency.

Instead, let's keep two flags: one to denote when we got
actual input (which triggers both roles) and one for when we
read stdin (which triggers only the first).

This does mean a caller interested in the first role has to
check both flags, but there's only one such caller. And any
future callers might want to make the distinction anyway
(e.g., if they care less about erroring out, and more about
whether revision.c soaked up our stdin).

In fact, we already keep such a flag internally in
revision.c for this purpose, so this is really just exposing
that to the caller (and the old function-local flag can go
away in favor of our new one).

[1] https://public-inbox.org/git/20170802223416.gwiezhbuxbdmbjzx@sigill.intra.peff.net/

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-22 14:44:50 -07:00
SZEDER Gábor
2745817028 t3420-rebase-autostash: don't try to grep non-existing files
Several tests in 't3420-rebase-autostash.sh' start various rebase
processes that are expected to fail because of merge conflicts.  These
tests then run '! grep' to ensure that the autostash feature did its
job, and the dirty contents of a file is gone.  However, due to the
test repo's history and the choice of upstream branch that file
shouldn't exist in the conflicted state at all.  Consequently, this
'grep' doesn't fail as expected, because it can't find the dirty
content, but it fails because it can't open the file.

Tighten this check by using 'test_path_is_missing' instead, thereby
avoiding unexpected errors from 'grep' as well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-22 11:52:51 -07:00
SZEDER Gábor
79b04f9b60 t3903-stash: don't try to grep non-existing file
The test 'store updates stash ref and reflog' in 't3903-stash.sh'
creates a stash from a new file, runs 'git reset --hard' to throw away
any modifications to the work tree, and then runs '! grep' to ensure
that the staged contents are gone.  Since the file didn't exist
before, it shouldn't exist after 'git reset' either.  Consequently,
this 'grep' doesn't fail as expected, because it can't find the staged
content, but it fails because it can't open the file.

Tighten this check by using 'test_path_is_missing' instead, thereby
avoiding an unexpected error from 'grep' as well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-22 11:52:35 -07:00
Junio C Hamano
29d9e3e2c4 Merge branch 'nd/pack-deltify-regression-fix'
In a recent update in 2.18 era, "git pack-objects" started
producing a larger than necessary packfiles by missing
opportunities to use large deltas.

* nd/pack-deltify-regression-fix:
  pack-objects: fix performance issues on packing large deltas
2018-08-22 11:17:05 -07:00
SZEDER Gábor
b89b4a660c t6018-rev-list-glob: fix 'empty stdin' test
Prior to d3c6751b18 (tests: make use of the test_must_be_empty
function, 2018-07-27), in the test 'rev-list should succeed with empty
output on empty stdin' in 't6018-rev-list-glob' the empty 'expect'
file served dual purpose: besides specifying the expected output, as
usual, it also served as empty input for 'git rev-list --stdin'.

Then d3c6751b18 came along, and, as part of the conversion to
'test_must_be_empty', removed this empty 'expect' file, not realizing
its secondary purpose.  Redirecting stdin from the now non-existing
file failed the test, but since this test expects failure in the first
place, this issue went unnoticed.

Redirect 'git rev-list's stdin explicitly from /dev/null to provide
empty input.  (Strictly speaking we don't need this redirection,
because the test script's stdin is already redirected from /dev/null
anyway, but I think it's better to be explicit about it.)

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-22 11:02:07 -07:00
SZEDER Gábor
c8b35b95e1 t4051-diff-function-context: read the right file
The test ' context does not include preceding empty lines' in the
block of tests 'change with long common tail and no context' in
't4051-diff-function-context.sh' tries to read the file
'long_common_tail.diff.diff', but that file doesn't exist as its name
contains one more '.diff' suffixes than necessary.

Despite this error the test still succeeded without checking what it's
supposed to, because this erroneous read is done on the line:

  test "$(first_context_line <long_common_tail.diff.diff)" != " "

which means that:

  - the command substitution hides the error, so it won't fail the
    test, and

  - the result of the command substitution is the empty string, which
    is, of course, not equal to a single space character, so the
    condition is fulfilled, and the test succeeds.

As a minimal fix, fix the name of the file to be read.

In the future we might want to reorganize this test script (1) to use
'test_cmp' instead of 'test's and command substitutions to catch
failing commands and to provide helpful error messages, and (2) to
specify what the expected result actually _is_ instead of what it
isn't.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-22 09:14:24 -07:00
SZEDER Gábor
30612cb670 t0020-crlf: check the right file
In the test 'checkout with autocrlf=input' in 't0020-crlf.sh', one of
the 'has_cr' checks looks at the non-existing file 'two' instead of
'dir/two'.  The test still succeeds, without actually checking what it
was supposed to, because this check is expected to fail anyway.

As a minimal fix, fix the name of the file to be checked.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-22 09:08:08 -07:00
SZEDER Gábor
15da753709 t7501-commit: drop silly command substitution
The test '--dry-run with conflicts fixed from a merge' in
't7501-commit.sh', added in 8dc874b2ee (wt-status.c: set commitable
bit if there is a meaningful merge., 2016-02-15), runs the following
unnecessary and downright bogus command substitution:

  ! $(git merge --no-commit commit-1) &&

I.e. after 'git merge ...' is executed and expectedly fails, the test
attempts to execute its output:

  Merging:
  80f2ea2 commit 2
  virtual commit-1
  found 1 common ancestor:
  e60d113 Initial commit
  Auto-merging test-file
  CONFLICT (content): Merge conflict in test-file
  Automatic merge failed; fix conflicts and then commit the result.

as a command, which most likely fails, because there is no such
command as "Merging:".  Then '!' negates the failed exit status, the
test continues, and eventually succeeds.

Remove this command substitution and use 'test_must_fail' to ensure
that 'git merge' fails.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-22 08:39:31 -07:00
Derrick Stolee
1820703045 commit: use timestamp_t for author_date_slab
The author_date_slab is used to store the author date of a commit
when walking with the --author-date flag in rev-list or log. This
was added as an 'unsigned long' in

	81c6b38b "log: --author-date-order"

Since 'unsigned long' is ambiguous in its bit-ness across platforms
(64-bit in Linux, 32-bit in Windows, for example), most references
to the author dates in commit.c were converted to timestamp_t in

	dddbad72 "timestamp_t: a new data type for timestamps"

However, the slab definition was missed, leading to a mismatch in
the data types in Windows. This would not reveal itself as a bug
unless someone authors a commit after February 2106, but commits
can store anything as their author date.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 14:08:18 -07:00
Jeff King
7a76f5c611 SubmittingPatches: mention doc-diff
We already advise people to make sure their documentation
formats correctly. Let's point them at the doc-diff script,
which can help with that.

Let's also put a brief note in the script about its purpose,
since that otherwise can only be found in the original
commit message. Along with the existing -h/usage text,
that's hopefully enough for developers to make use of it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 12:54:33 -07:00
Jeff King
6a1e32d532 pack-objects: reuse on-disk deltas for thin "have" objects
When we serve a fetch, we pass the "wants" and "haves" from
the fetch negotiation to pack-objects. That tells us not
only which objects we need to send, but we also use the
boundary commits as "preferred bases": their trees and blobs
are candidates for delta bases, both for reusing on-disk
deltas and for finding new ones.

However, this misses some opportunities. Modulo some special
cases like shallow or partial clones, we know that every
object reachable from the "haves" could be a preferred base.
We don't use all of them for two reasons:

  1. It's expensive to traverse the whole history and
     enumerate all of the objects the other side has.

  2. The delta search is expensive, so we want to keep the
     number of candidate bases sane. The boundary commits
     are the most likely to work.

When we have reachability bitmaps, though, reason 1 no
longer applies. We can efficiently compute the set of
reachable objects on the other side (and in fact already did
so as part of the bitmap set-difference to get the list of
interesting objects). And using this set conveniently
covers the shallow and partial cases, since we have to
disable the use of bitmaps for those anyway.

The second reason argues against using these bases in the
search for new deltas. But there's one case where we can use
this information for free: when we have an existing on-disk
delta that we're considering reusing, we can do so if we
know the other side has the base object. This in fact saves
time during the delta search, because it's one less delta we
have to compute.

And that's exactly what this patch does: when we're
considering whether to reuse an on-disk delta, if bitmaps
tell us the other side has the object (and we're making a
thin-pack), then we reuse it.

Here are the results on p5311 using linux.git, which
simulates a client fetching after `N` days since their last
fetch:

 Test                         origin              HEAD
 --------------------------------------------------------------------------
 5311.3: server   (1 days)    0.27(0.27+0.04)     0.12(0.09+0.03) -55.6%
 5311.4: size     (1 days)               0.9M              237.0K -73.7%
 5311.5: client   (1 days)    0.04(0.05+0.00)     0.10(0.10+0.00) +150.0%
 5311.7: server   (2 days)    0.34(0.42+0.04)     0.13(0.10+0.03) -61.8%
 5311.8: size     (2 days)               1.5M              347.7K -76.5%
 5311.9: client   (2 days)    0.07(0.08+0.00)     0.16(0.15+0.01) +128.6%
 5311.11: server   (4 days)   0.56(0.77+0.08)     0.13(0.10+0.02) -76.8%
 5311.12: size     (4 days)              2.8M              566.6K -79.8%
 5311.13: client   (4 days)   0.13(0.15+0.00)     0.34(0.31+0.02) +161.5%
 5311.15: server   (8 days)   0.97(1.39+0.11)     0.30(0.25+0.05) -69.1%
 5311.16: size     (8 days)              4.3M                1.0M -76.0%
 5311.17: client   (8 days)   0.20(0.22+0.01)     0.53(0.52+0.01) +165.0%
 5311.19: server  (16 days)   1.52(2.51+0.12)     0.30(0.26+0.03) -80.3%
 5311.20: size    (16 days)              8.0M                2.0M -74.5%
 5311.21: client  (16 days)   0.40(0.47+0.03)     1.01(0.98+0.04) +152.5%
 5311.23: server  (32 days)   2.40(4.44+0.20)     0.31(0.26+0.04) -87.1%
 5311.24: size    (32 days)             14.1M                4.1M -70.9%
 5311.25: client  (32 days)   0.70(0.90+0.03)     1.81(1.75+0.06) +158.6%
 5311.27: server  (64 days)   11.76(26.57+0.29)   0.55(0.50+0.08) -95.3%
 5311.28: size    (64 days)             89.4M               47.4M -47.0%
 5311.29: client  (64 days)   5.71(9.31+0.27)     15.20(15.20+0.32) +166.2%
 5311.31: server (128 days)   16.15(36.87+0.40)   0.91(0.82+0.14) -94.4%
 5311.32: size   (128 days)            134.8M              100.4M -25.5%
 5311.33: client (128 days)   9.42(16.86+0.49)    25.34(25.80+0.46) +169.0%

In all cases we save CPU time on the server (sometimes
significant) and the resulting pack is smaller. We do spend
more CPU time on the client side, because it has to
reconstruct more deltas. But that's the right tradeoff to
make, since clients tend to outnumber servers. It just means
the thin pack mechanism is doing its job.

From the user's perspective, the end-to-end time of the
operation will generally be faster. E.g., in the 128-day
case, we saved 15s on the server at a cost of 16s on the
client. Since the resulting pack is 34MB smaller, this is a
net win if the network speed is less than 270Mbit/s. And
that's actually the worst case. The 64-day case saves just
over 11s at a cost of just under 11s. So it's a slight win
at any network speed, and the 40MB saved is pure bonus. That
trend continues for the smaller fetches.

The implementation itself is mostly straightforward, with
the new logic going into check_object(). But there are two
tricky bits.

The first is that check_object() needs access to the
relevant information (the thin flag and bitmap result). We
can do this by pushing these into program-lifetime globals.

The second is that the rest of the code assumes that any
reused delta will point to another "struct object_entry" as
its base. But of course the case we are interested in here
is the one where don't have such an entry!

I looked at a number of options that didn't quite work:

 - we could use a flag to signal a reused delta, but it's
   not a single bit. We have to actually store the oid of
   the base, which is normally done by pointing to the
   existing object_entry. And we'd have to modify all the
   code which looks at deltas.

 - we could add the reused bases to the end of the existing
   object_entry array. While this does create some extra
   work as later stages consider the extra entries, it's
   actually not too bad (we're not sending them, so they
   don't cost much in the delta search, and at most we'd
   have 2*N of them).

   But there's a more subtle problem. Adding to the existing
   array means we might need to grow it with realloc, which
   could move the earlier entries around. While many of the
   references to other entries are done by integer index,
   some (including ones on the stack) use pointers, which
   would become invalidated.

   This isn't insurmountable, but it would require quite a
   bit of refactoring (and it's hard to know that you've got
   it all, since it may work _most_ of the time and then
   fail subtly based on memory allocation patterns).

 - we could allocate a new one-off entry for the base. In
   fact, this is what an earlier version of this patch did.
   However, since the refactoring brought in by ad635e82d6
   (Merge branch 'nd/pack-objects-pack-struct', 2018-05-23),
   the delta_idx code requires that both entries be in the
   main packing list.

So taking all of those options into account, what I ended up
with is a separate list of "external bases" that are not
part of the main packing list. Each delta entry that points
to an external base has a single-bit flag to do so; we have a
little breathing room in the bitfield section of
object_entry.

This lets us limit the change primarily to the oe_delta()
and oe_set_delta_ext() functions. And as a bonus, most of
the rest of the code does not consider these dummy entries
at all, saving both runtime CPU and code complexity.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 12:45:49 -07:00
Jeff King
30cdc33fba pack-bitmap: save "have" bitmap from walk
When we do a bitmap walk, we save the result, which
represents (WANTs & ~HAVEs); i.e., every object we care
about visiting in our walk. However, we throw away the
haves bitmap, which can sometimes be useful, too. Save it
and provide an access function so code which has performed a
walk can query it.

A few notes on the accessor interface:

 - the bitmap code calls these "haves" because it grew out
   of the want/have negotiation for fetches. But really,
   these are simply the objects that would be flagged
   UNINTERESTING in a regular traversal. Let's use that
   more universal nomenclature for the external module
   interface. We may want to change the internal naming
   inside the bitmap code, but that's outside the scope of
   this patch.

 - it still uses a bare "sha1" rather than "oid". That's
   true of all of the bitmap code. And in this particular
   instance, our caller in pack-objects is dealing with the
   bare sha1 that comes from a packed REF_DELTA (we're
   pointing directly to the mmap'd pack on disk). That's
   something we'll have to deal with as we transition to a
   new hash, but we can wait and see how the caller ends up
   being fixed and adjust this interface accordingly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 12:33:39 -07:00
Jeff King
69d846f053 test-tool.h: include git-compat-util.h
The test-tool programs include "test-tool.h" as their first
include, which breaks our CodingGuideline of "the first
include must be git-compat-util.h or an equivalent".

Rather than change them all, let's instead make test-tool.h
one of those equivalents, just like we do for builtin.h
(which many of the actual git builtins include first).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 12:11:40 -07:00
SZEDER Gábor
1c5e94f459 tests: use 'test_must_be_empty' instead of 'test_cmp <empty> <out>'
Using 'test_must_be_empty' is shorter and more idiomatic than

  >empty &&
  test_cmp empty out

as it saves the creation of an empty file.  Furthermore, sometimes the
expected empty file doesn't have such a descriptive name like 'empty',
and its creation is far away from the place where it's finally used
for comparison (e.g. in 't7600-merge.sh', where two expected empty
files are created in the 'setup' test, but are used only about 500
lines later).

These cases were found by instrumenting 'test_cmp' to error out the
test script when it's used to compare empty files, and then converted
manually.

Note that even after this patch there still remain a lot of cases
where we use 'test_cmp' to check empty files:

  - Sometimes the expected output is not hard-coded in the test, but
    'test_cmp' is used to ensure that two similar git commands produce
    the same output, and that output happens to be empty, e.g. the
    test 'submodule update --merge  - ignores --merge  for new
    submodules' in 't7406-submodule-update.sh'.

  - Repetitive common tasks, including preparing the expected results
    and running 'test_cmp', are often extracted into a helper
    function, and some of this helper's callsites expect no output.

  - For the same reason as above, the whole 'test_expect_success'
    block is within a helper function, e.g. in 't3070-wildmatch.sh'.

  - Or 'test_cmp' is invoked in a loop, e.g. the test 'cvs update
    (-p)' in 't9400-git-cvsserver-server.sh'.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 11:48:36 -07:00
SZEDER Gábor
ec21ac8c18 tests: use 'test_must_be_empty' instead of 'test_cmp /dev/null <out>'
Using 'test_must_be_empty' is more idiomatic than 'test_cmp /dev/null
out', and its message on error is perhaps a bit more to the point.

This patch was basically created by running:

  sed -i -e 's%test_cmp /dev/null%test_must_be_empty%' t[0-9]*.sh

with the exception of the change in 'should not fail in an empty repo'
in 't7401-submodule-summary.sh', where it was 'test_cmp output
/dev/null'.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 11:48:34 -07:00
SZEDER Gábor
f0dc593a95 tests: use 'test_must_be_empty' instead of 'test ! -s'
Using 'test_must_be_empty' is preferable to 'test ! -s', because it
gives a helpful error message if the given file is unexpectedly no
empty, while the latter remains completely silent.  Furthermore, it
also catches cases when the given file unexpectedly does not exist at
all.

This patch was created by:

  sed -i -e 's/test ! -s/test_must_be_empty/' t[0-9]*.sh

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 11:48:31 -07:00
SZEDER Gábor
ec10b018e7 tests: use 'test_must_be_empty' instead of '! test -s'
Using 'test_must_be_empty' is preferable to '! test -s', because it
gives a helpful error message if the given file is unexpectedly not
empty, while the latter remains completely silent.  Furthermore, it
also catches cases when the given file unexpectedly does not exist at
all.

This patch was basically created by:

  sed -i -e 's/! test -s/test_must_be_empty/' t[0-9]*.sh

with the following notable exceptions:

  - The '! test -s' check in '.gitmodules ignore=dirty suppresses
    submodules with untracked content' in 't7508-status.sh' is left
    as-is, because it's bogus and, therefore, it's subject of a
    dedicated patch.

  - The '! test -s' checks in 't9131-git-svn-empty-symlink.sh' and
    't9135-git-svn-moved-branch-empty-file.sh' are immediately
    preceeded by a 'test -f' to ensure that the files exist in the
    first place.  'test_must_be_empty' ensures that as well, so those
    'test -f' commands are removed as well.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 11:48:29 -07:00
René Scharfe
bbc072f5d8 parseopt: group literal string alternatives in argument help
This formally clarifies that the "--option=" part is the same for all
alternatives.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 11:35:54 -07:00
René Scharfe
446e63ccf5 remote: improve argument help for add --mirror
Group the possible values using a pair of parentheses and don't mark
them for translation, as they are literal strings that have to be used
as-is in any locale.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 11:33:21 -07:00
René Scharfe
168f32eb10 checkout-index: improve argument help for --stage
Spell out all alternatives and avoid using a numerical range operator,
as it is not mentioned in CodingGuidelines and the resulting string is
still concise.  Wrap them in parentheses to document clearly that the
"--stage=" part is common among them.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 11:32:44 -07:00
Nguyễn Thái Ngọc Duy
eb90ea79c5 generate-cmdlist.sh: collect config from all config.txt files
This script uses Documentation/config.txt as input for "git help
--config" and "git config" completion but it misses the fact that
config.txt includes other txt files. Include all *config.txt as input
when scanning for config keys. This could produce false positives, but
as long as we stick to the blah-config.txt naming convention, we
should be ok.

While at there, move diff.* from config.txt to diff-config.txt where
all other diff config keys are.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-21 11:28:11 -07:00
Jiang Xin
dba9f13c6a l10n: git.pot: v2.19.0 round 1 (382 new, 30 removed)
Generate po/git.pot from v2.19.0-rc0 for git v2.19.0 l10n round 1.

Signed-off-by: Jiang Xin <worldhello.net@gmail.com>
2018-08-21 08:28:47 +08:00
Jiang Xin
6b7719ee3f Merge branch 'maint' of git://github.com/git-l10n/git-po
* 'maint' of git://github.com/git-l10n/git-po:
  l10n: de.po: translate 108 new messages
  l10n: zh_CN: review for git 2.18.0
  l10n: sv.po: Update Swedish translation(3608t0f0u)
2018-08-21 08:22:04 +08:00
Derrick Stolee
6a22d52126 pack-objects: consider packs in multi-pack-index
When running 'git pack-objects --local', we want to avoid packing
objects that are in an alternate. Currently, we check for these
objects using the packed_git_mru list, which excludes the pack-files
covered by a multi-pack-index.

Add a new iteration over the multi-pack-indexes to find these
copies and mark them as unwanted.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:40 -07:00
Derrick Stolee
e9ab2ed7de midx: test a few commands that use get_all_packs
The new get_all_packs() method exposed the packfiles coverede by
a multi-pack-index. Before, the 'git cat-file --batch' and
'git count-objects' commands would skip objects in an environment
with a multi-pack-index.

Further, a reachability bitmap would be ignored if its pack-file
was covered by a multi-pack-index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:40 -07:00
Derrick Stolee
454ea2e4d7 treewide: use get_all_packs
There are many places in the codebase that want to iterate over
all packfiles known to Git. The purposes are wide-ranging, and
those that can take advantage of the multi-pack-index already
do. So, use get_all_packs() instead of get_packed_git() to be
sure we are iterating over all packfiles.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:40 -07:00
Derrick Stolee
0bff5269d3 packfile: add all_packs list
If a repo contains a multi-pack-index, then the packed_git list
does not contain the packfiles that are covered by the multi-pack-index.
This is important for doing object lookups, abbreviations, and
approximating object count. However, there are many operations that
really want to iterate over all packfiles.

Create a new 'all_packs' linked list that contains this list, starting
with the packfiles in the multi-pack-index and then continuing along
the packed_git linked list.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:40 -07:00
Derrick Stolee
29e2016b8f midx: fix bug that skips midx with alternates
The logic for constructing the linked list of multi-pack-indexes
in the object store is incorrect. If the local object store has
a multi-pack-index, but an alternate does not, then the list is
dropped.

Add tests that would have revealed this bug.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:39 -07:00
Derrick Stolee
fe86c3beb5 midx: stop reporting garbage
When prepare_packed_git is called with the report_garbage method
initialized, we report unexpected files in the objects directory
as garbage. Stop reporting the multi-pack-index and the pack-files
it covers as garbage.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:39 -07:00
Derrick Stolee
c39b02ae0a midx: mark bad packed objects
When an object fails to decompress from a pack-file, we mark the object
as 'bad' so we can retry with a different copy of the object (if such a
copy exists).

Before now, the multi-pack-index did not update the bad objects list for
the pack-files it contains, and we did not check the bad objects list
when reading an object. Now, do both.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:39 -07:00
Derrick Stolee
2cf489a3bf multi-pack-index: store local property
A pack-file is 'local' if it is stored within the usual object
directory. If it is stored in an alternate, it is non-local.

Pack-files are stored using a 'pack_local' member in the packed_git
struct. Add a similar 'local' member to the multi_pack_index struct
and 'local' parameters to the methods that load and prepare multi-
pack-indexes.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:39 -07:00
Derrick Stolee
6d68e6a461 multi-pack-index: provide more helpful usage info
The multi-pack-index builtin has a very simple command-line
interface. Instead of simply reporting usage, give the user a
hint to why the arguments failed.

Reported-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:39 -07:00
Junio C Hamano
c00ba2233e Sync 'ds/multi-pack-index' to v2.19.0-rc0
* ds/multi-pack-index: (23 commits)
  midx: clear midx on repack
  packfile: skip loading index if in multi-pack-index
  midx: prevent duplicate packfile loads
  midx: use midx in approximate_object_count
  midx: use existing midx when writing new one
  midx: use midx in abbreviation calculations
  midx: read objects from multi-pack-index
  config: create core.multiPackIndex setting
  midx: write object offsets
  midx: write object id fanout chunk
  midx: write object ids in a chunk
  midx: sort and deduplicate objects from packfiles
  midx: read pack names into array
  multi-pack-index: write pack names in chunk
  multi-pack-index: read packfile list
  packfile: generalize pack directory list
  t5319: expand test data
  multi-pack-index: load into memory
  midx: write header information to lockfile
  multi-pack-index: add 'write' verb
  ...
2018-08-20 15:29:54 -07:00
Stefan Beller
2543a64187 range-diff: indent special lines as context
The range-diff coloring is a bit fuzzy when it comes to special lines of
a diff, such as indicating new and old files with +++ and ---, as it
would pickup the first character and interpret it for its coloring, which
seems annoying as in regular diffs, these lines are colored bold via
DIFF_METAINFO.

By indenting these lines by a white space, they will be treated as context
which is much more useful, an example [1] on the range diff series itself:

[...]
    + diff --git a/Documentation/git-range-diff.txt b/Documentation/git-range-diff.txt
    + new file mode 100644
    + --- /dev/null
    + +++ b/Documentation/git-range-diff.txt
    +@@
    ++git-range-diff(1)
[...]
    +
      diff --git a/Makefile b/Makefile
      --- a/Makefile
      +++ b/Makefile
[...]

The first lines that introduce the new file for the man page will have the
'+' sign colored and the rest of the line will be bold.

The later lines that indicate a change to the Makefile will be treated as
context both in the outer and inner diff, such that those lines stay
regular color.

[1] ./git-range-diff pr-1/dscho/branch-diff-v3...pr-1/dscho/branch-diff-v4
    These tags are found at https://github.com/gitgitgadget/git

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 14:33:28 -07:00
Stefan Beller
8d5ccb59de range-diff: make use of different output indicators
This change itself only changes the internal communication and should
have no visible effect to the user. We instruct the diff code that
produces the inner diffs to use other markers instead of the
usual markers for new, old and context lines.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 14:33:26 -07:00
Stefan Beller
7648b79eee diff.c: add --output-indicator-{new, old, context}
This will prove useful in range-diff in a later patch as we will be able to
differentiate between adding a new file (that line is starting with +++
and then the file name) and regular new lines.

It could also be useful for experimentation in new patch formats, i.e.
we could teach git to emit moved lines with lines other than +/-.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 14:33:24 -07:00
Jeff King
198b349da8 t/perf: add perf tests for fetches from a bitmapped server
A server with bitmapped packs can serve a clone very
quickly. However, fetches are not necessarily made any
faster, because we spend a lot less time in object traversal
(which is what bitmaps help with) and more time finding
deltas (because we may have to throw out on-disk deltas if
the client does not have the base).

As a first step to making this faster, this patch introduces
a new perf script to measure fetches into a repo of various
ages from a fully-bitmapped server.

We separately measure the work done by the server (in
pack-objects) and that done by the client (in index-pack).
Furthermore, we measure the size of the resulting pack.

Breaking it down like this (instead of just doing a regular
"git fetch") lets us see how much each side benefits from
any changes. And since we know the pack size, if we estimate
the network speed, then one could calculate a complete
wall-clock time for the operation (though the script does
not do this automatically).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 14:04:47 -07:00
Jeff King
22bec79d1a t/perf: add infrastructure for measuring sizes
The main objective of scripts in the perf framework is to
run "test_perf", which measures the time it takes to run
some operation. However, it can also be interesting to see
the change in the output size of certain operations.

This patch introduces test_size, which records a single
numeric output from the test and shows it in the aggregated
output (with pretty printing and relative size comparison).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 14:04:47 -07:00