1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-24 06:06:14 +02:00
Commit Graph

131 Commits

Author SHA1 Message Date
Junio C Hamano 39fe402d67 Merge branch 'tb/refs-exclusion-and-packed-refs'
Enumerating refs in the packed-refs file, while excluding refs that
match certain patterns, has been optimized.

* tb/refs-exclusion-and-packed-refs:
  ls-refs.c: avoid enumerating hidden refs where possible
  upload-pack.c: avoid enumerating hidden refs where possible
  builtin/receive-pack.c: avoid enumerating hidden references
  refs.h: implement `hidden_refs_to_excludes()`
  refs.h: let `for_each_namespaced_ref()` take excluded patterns
  revision.h: store hidden refs in a `strvec`
  refs/packed-backend.c: add trace2 counters for jump list
  refs/packed-backend.c: implement jump lists to avoid excluded pattern(s)
  refs/packed-backend.c: refactor `find_reference_location()`
  refs: plumb `exclude_patterns` argument throughout
  builtin/for-each-ref.c: add `--exclude` option
  ref-filter.c: parameterize match functions over patterns
  ref-filter: add `ref_filter_clear()`
  ref-filter: clear reachable list pointers after freeing
  ref-filter.h: provide `REF_FILTER_INIT`
  refs.c: rename `ref_filter`
2023-07-21 13:47:26 -07:00
Taylor Blau 8255dd8a3d builtin/for-each-ref.c: add `--exclude` option
When using `for-each-ref`, it is sometimes convenient for the caller to
be able to exclude certain parts of the references.

For example, if there are many `refs/__hidden__/*` references, the
caller may want to emit all references *except* the hidden ones.
Currently, the only way to do this is to post-process the output, like:

    $ git for-each-ref --format='%(refname)' | grep -v '^refs/hidden/'

Which is do-able, but requires processing a potentially large quantity
of references.

Teach `git for-each-ref` a new `--exclude=<pattern>` option, which
excludes references from the results if they match one or more excluded
patterns.

This patch provides a naive implementation where the `ref_filter` still
sees all references (including ones that it will discard) and is left to
check whether each reference matches any excluded pattern(s) before
emitting them.

By culling out references we know the caller doesn't care about, we can
avoid allocating memory for their storage, as well as spending time
sorting the output (among other things). Even the naive implementation
provides a significant speed-up on a modified copy of linux.git (that
has a hidden ref pointing at each commit):

    $ hyperfine \
      'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"' \
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/'
    Benchmark 1: git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"
      Time (mean ± σ):     820.1 ms ±   2.0 ms    [User: 703.7 ms, System: 152.0 ms]
      Range (min … max):   817.7 ms … 823.3 ms    10 runs

    Benchmark 2: git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/
      Time (mean ± σ):     106.6 ms ±   1.1 ms    [User: 99.4 ms, System: 7.1 ms]
      Range (min … max):   104.7 ms … 109.1 ms    27 runs

    Summary
      'git.compile for-each-ref --format="%(objectname) %(refname)" --exclude refs/pull/' ran
        7.69 ± 0.08 times faster than 'git.compile for-each-ref --format="%(objectname) %(refname)" | grep -vE "[0-9a-f]{40} refs/pull/"'

Subsequent patches will improve on this by avoiding visiting excluded
sections of the `packed-refs` file in certain cases.

Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-07-10 14:48:55 -07:00
Kousik Sanagavarapu 26c9c03f0a ref-filter: add new "signature" atom
Duplicate the code for outputting the signature and its other
parameters for commits and tags in ref-filter from pretty. In the
future, this will help in getting rid of the current duplicate
implementations of such logic everywhere, when ref-filter can do
everything that pretty is doing.

The new atom "signature" and its friends are equivalent to the existing
pretty formats as follows:

	%(signature) = %GG
	%(signature:grade) = %G?
	%(siganture:signer) = %GS
	%(signature:key) = %GK
	%(signature:fingerprint) = %GF
	%(signature:primarykeyfingerprint) = %GP
	%(signature:trustlevel) = %GT

Co-authored-by: Hariom Verma <hariom18599@gmail.com>
Co-authored-by: Jaydeep Das <jaydeepjd.8914@gmail.com>
Co-authored-by: Nsengiyumva Wilberforce <nsengiyumvawilberforce@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Kousik Sanagavarapu <five231003@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-06 09:32:15 +09:00
Junio C Hamano b64894c206 Merge branch 'ow/ref-filter-omit-empty'
"git branch --format=..." and "git format-patch --format=..."
learns "--omit-empty" to hide refs that whose formatting result
becomes an empty string from the output.

* ow/ref-filter-omit-empty:
  branch, for-each-ref, tag: add option to omit empty lines
2023-04-21 15:35:05 -07:00
Øystein Walle aabfdc9514 branch, for-each-ref, tag: add option to omit empty lines
If the given format string expands to the empty string, a newline is
still printed. This makes using the output linewise more tedious. For
example, git update-ref --stdin does not accept empty lines.

Add options to "git branch", "git for-each-ref", and "git tag" to
not print these empty lines.  The default behavior remains the same.

Signed-off-by: Øystein Walle <oystwa@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-13 08:07:45 -07:00
Derrick Stolee b2c51b7590 for-each-ref: explicitly test no matches
The for-each-ref builtin can take a list of ref patterns, but if none
match, it still succeeds (but with no output). Add an explicit test that
demonstrates that behavior.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 12:17:32 -07:00
Derrick Stolee b73dec5530 for-each-ref: add --stdin option
When a user wishes to input a large list of patterns to 'git
for-each-ref' (likely a long list of exact refs) there are frequently
system limits on the number of command-line arguments.

Add a new --stdin option to instead read the patterns from standard
input. Add tests that check that any unrecognized arguments are
considered an error when --stdin is provided. Also, an empty pattern
list is interpreted as the complete ref set.

When reading from stdin, we populate the filter.name_patterns array
dynamically as opposed to pointing to the 'argv' array directly. This is
simple when using a strvec, as it is NULL-terminated in the same way. We
then free the memory directly from the strvec.

Helped-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 12:17:32 -07:00
Junio C Hamano abf2bb895b Merge branch 'jk/hash-object-fsck'
"git hash-object" now checks that the resulting object is well
formed with the same code as "git fsck".

* jk/hash-object-fsck:
  fsck: do not assume NUL-termination of buffers
  hash-object: use fsck for object checks
  fsck: provide a function to fsck buffer without object struct
  t: use hash-object --literally when created malformed objects
  t7030: stop using invalid tag name
  t1006: stop using 0-padded timestamps
  t1007: modernize malformed object tests
2023-01-30 14:24:22 -08:00
Jeff King 34959d80db t: use hash-object --literally when created malformed objects
Many test scripts use hash-object to create malformed objects to see how
we handle the results in various commands. In some cases we already have
to use "hash-object --literally", because it does some rudimentary
quality checks. But let's use "--literally" more consistently to
future-proof these tests against hash-object learning to be more
careful.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-01-18 12:59:44 -08:00
Jeff King 1955ef10ed ref-filter: truncate atom names in error messages
If you pass a bogus argument to %(refname), you may end up with a
message like this:

  $ git for-each-ref --format='%(refname:foo)'
  fatal: unrecognized %(refname:foo) argument: foo

which is confusing. It should just say:

  fatal: unrecognized %(refname) argument: foo

which is clearer, and is consistent with most other atom parsers. Those
other parsers do not have the same problem because they pass the atom
name from a string literal in the parser function. But because the
parser for %(refname) also handles %(upstream) and %(push), it instead
uses atom->name, which includes the arguments. The oid atom parser which
handles %(tree), %(parent), etc suffers from the same problem.

It seems like the cleanest fix would be for atom->name to be _just_ the
name, since there's already a separate "args" field. But since that
field is also used for other things, we can't change it easily (e.g.,
it's how we find things in the used_atoms array, and clearly %(refname)
and %(refname:short) are not the same thing).

Instead, we'll teach our error_bad_arg() function to stop at the first
":". This is a little hacky, as we're effectively re-parsing the name,
but the format is simple enough to do this as a one-liner, and this
localizes the change to the error-reporting code.

We'll give the same treatment to err_no_arg(). None of its callers use
this atom->name trick, but it's worth future-proofing it while we're
here.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15 09:14:04 +09:00
Jeff King dda4fc1a84 ref-filter: factor out "unrecognized %(foo) arg" errors
Atom parsers that take arguments generally have a catch-all for "this
arg is not recognized". Most of them use the same printf template, which
is good, because it makes life easier for translators. Let's pull this
template into a helper function, which makes the code in the parsers
shorter and avoids any possibility of differences.

As with the previous commit, we'll pick an arbitrary atom to make sure
the test suite covers this code.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15 09:14:00 +09:00
Jeff King a33d0fae76 ref-filter: factor out "%(foo) does not take arguments" errors
Many atom parsers give the same error message, differing only in the
name of the atom. If we use "%s does not take arguments", that should
make life easier for translators, as they only need to translate one
string. And in doing so, we can easily pull it into a helper function to
make sure they are all using the exact same string.

I've added a basic test here for %(HEAD), just to make sure this code is
exercised at all in the test suite. We could cover each such atom, but
the effort-to-reward ratio of trying to maintain an exhaustive list
doesn't seem worth it.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-15 09:13:56 +09:00
Jeff King 8e1c5fcf28 ref-filter: fix parsing of signatures with CRLF and no body
This commit fixes a bug when parsing tags that have CRLF line endings, a
signature, and no body, like this (the "^M" are marking the CRs):

  this is the subject^M
  -----BEGIN PGP SIGNATURE-----^M
  ^M
  ...some stuff...^M
  -----END PGP SIGNATURE-----^M

When trying to find the start of the body, we look for a blank line
separating the subject and body. In this case, there isn't one. But we
search for it using strstr(), which will find the blank line in the
signature.

In the non-CRLF code path, we check whether the line we found is past
the start of the signature, and if so, put the body pointer at the start
of the signature (effectively making the body empty). But the CRLF code
path doesn't catch the same case, and we end up with the body pointer in
the middle of the signature field. This has two visible problems:

  - printing %(contents:subject) will show part of the signature, too,
    since the subject length is computed as (body - subject)

  - the length of the body is (sig - body), which makes it negative.
    Asking for %(contents:body) causes us to cast this to a very large
    size_t when we feed it to xmemdupz(), which then complains about
    trying to allocate too much memory.

These are essentially the same bugs fixed in the previous commit, except
that they happen when there is a CRLF blank line in the signature,
rather than no blank line at all. Both are caused by the refactoring in
9f75ce3d8f (ref-filter: handle CRLF at end-of-line more gracefully,
2020-10-29).

We can fix this by doing the same "sigstart" check that we do in the
non-CRLF case. And rather than repeat ourselves, we can just use
short-circuiting OR to collapse both cases into a single conditional.
I.e., rather than:

  if (strstr("\n\n"))
    ...found blank, check if it's in signature...
  else if (strstr("\r\n\r\n"))
    ...found blank, check if it's in signature...
  else
    ...no blank line found...

we can collapse this to:

  if (strstr("\n\n")) ||
      strstr("\r\n\r\n")))
    ...found blank, check if it's in signature...
  else
    ...no blank line found...

The tests show the problem and the fix. Though it wasn't broken, I
included contents:signature here to make sure it still behaves as
expected, but note the shell hackery needed to make it work. A
less-clever option would be to skip using test_atom and just "append_cr
>expected" ourselves.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-02 21:36:04 -04:00
Jeff King b01e1c7ef0 ref-filter: fix parsing of signatures without blank lines
When ref-filter is asked to show %(content:subject), etc, we end up in
find_subpos() to parse out the three major parts: the subject, the body,
and the signature (if any).

When searching for the blank line between the subject and body, if we
don't find anything, we try to treat the whole message as the subject,
with no body. But our idea of "the whole message" needs to take into
account the signature, too. Since 9f75ce3d8f (ref-filter: handle CRLF at
end-of-line more gracefully, 2020-10-29), the code instead goes all the
way to the end of the buffer, which produces confusing output.

Here's an example. If we have a tag message like this:

  this is the subject
  -----BEGIN SSH SIGNATURE-----
  ...some stuff...
  -----END SSH SIGNATURE-----

then the current parser will put the start of the body at the end of the
whole buffer. This produces two buggy outcomes:

  - since the subject length is computed as (body - subject), showing
    %(contents:subject) will print both the subject and the signature,
    rather than just the single line

  - since the body length is computed as (sig - body), and the body now
    starts _after_ the signature, we end up with a negative length!
    Fortunately we never access out-of-bounds memory, because the
    negative length is fed to xmemdupz(), which casts it to a size_t,
    and xmalloc() bails trying to allocate an absurdly large value.

    In theory it would be possible for somebody making a malicious tag
    to wrap it around to a more reasonable value, but it would require a
    tag on the order of 2^63 bytes. And even if they did, all they get
    is an out of bounds string read. So the security implications are
    probably not interesting.

We can fix both by correctly putting the start of the body at the same
index as the start of the signature (effectively making the body empty).

Note that this is a real issue with signatures generated with gpg.format
set to "ssh", which would look like the example above. In the new tests
here I use a hard-coded tag message, for a few reasons:

  - regardless of what the ssh-signing code produces now or in the
    future, we should be testing this particular case

  - skipping the actual signature makes the tests simpler to write (and
    allows them to run on more systems)

  - t6300 has helpers for working with gpg signatures; for the purposes
    of this bug, "BEGIN PGP" is just as good a demonstration, and this
    simplifies the tests

Curiously, the same issue doesn't happen with real gpg signatures (and
there are even existing tests in t6300 with cover this). Those have a
blank line between the header and the content, like:

  this is the subject
  -----BEGIN PGP SIGNATURE-----

  ...some stuff...
  -----END PGP SIGNATURE-----

Because we search for the subject/body separator line with a strstr(),
we find the blank line in the signature, even though it's outside of
what we'd consider the body. But that puts us unto a separate code path,
which realizes that we're now in the signature and adjusts the line back
to "sigstart". So this patch is basically just making the "no line found
at all" case match that. And note that "sigstart" is always defined (if
there is no signature, it points to the end of the buffer as you'd
expect).

Reported-by: Martin Englund <martin@englund.nu>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
2022-11-02 21:36:04 -04:00
Junio C Hamano 4f4b18497a Merge branch 'es/test-chain-lint'
Broken &&-chains in the test scripts have been corrected.

* es/test-chain-lint:
  t6000-t9999: detect and signal failure within loop
  t5000-t5999: detect and signal failure within loop
  t4000-t4999: detect and signal failure within loop
  t0000-t3999: detect and signal failure within loop
  tests: simplify by dropping unnecessary `for` loops
  tests: apply modern idiom for exiting loop upon failure
  tests: apply modern idiom for signaling test failure
  tests: fix broken &&-chains in `{...}` groups
  tests: fix broken &&-chains in `$(...)` command substitutions
  tests: fix broken &&-chains in compound statements
  tests: use test_write_lines() to generate line-oriented output
  tests: simplify construction of large blocks of text
  t9107: use shell parameter expansion to avoid breaking &&-chain
  t6300: make `%(raw:size) --shell` test more robust
  t5516: drop unnecessary subshell and command invocation
  t4202: clarify intent by creating expected content less cleverly
  t1020: avoid aborting entire test script when one test fails
  t1010: fix unnoticed failure on Windows
  t/lib-pager: use sane_unset() to avoid breaking &&-chain
2022-01-03 16:24:15 -08:00
Eric Sunshine 0c51d6b4ae t6000-t9999: detect and signal failure within loop
Failures within `for` and `while` loops can go unnoticed if not detected
and signaled manually since the loop itself does not abort when a
contained command fails, nor will a failure necessarily be detected when
the loop finishes since the loop returns the exit code of the last
command it ran on the final iteration, which may not be the command
which failed. Therefore, detect and signal failures manually within
loops using the idiom `|| return 1` (or `|| exit 1` within subshells).

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 10:29:48 -08:00
Eric Sunshine efe47c83b2 t6300: make `%(raw:size) --shell` test more robust
This test populates its `expect` file solely by appending content but
fails to ensure that the file starts out empty. The test succeeds only
because no earlier test populated a file of the exact same name, however
this is an accident waiting to happen. Make the test more robust by
ensuring that it contains exactly the intended content.

While at it, simplify the implementation via a straightforward `sed`
application and by avoiding dropping out of the single-quote context
within the test body (thus eliminating a hard-to-digest combination of
apostrophes and backslashes).

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-13 10:29:48 -08:00
Johannes Schindelin 9f3547837e tests: set GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME only when needed
A couple of test scripts have actually been adapted to accommodate for a
configurable default branch name, but they still overrode it via the
`GIT_TEST_*` variable. Let's drop that override where possible.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-05 11:34:28 -08:00
Junio C Hamano 98e7ab6d42 for-each-ref: delay parsing of --sort=<atom> options
The for-each-ref family of commands invoke parsers immediately when
it sees each --sort=<atom> option, and die before even seeing the
other options on the command line when the <atom> is unrecognised.

Instead, accumulate them in a string list, and have them parsed into
a ref_sorting structure after the command line parsing is done.  As
a consequence, "git branch --sort=bogus -h" used to fail to give the
brief help, which arguably may have been a feature, now does so,
which is more consistent with how other options work.

The patch is smaller than the actual extent of the "damage" to the
codebase, thanks to the fact that the original code consistently
used OPT_REF_SORT() macro to handle command line options.  We only
needed to replace the variable used for the list, and implementation
of the callback function used in the macro.

The old rule was for the users of the API to:

 - Declare ref_sorting and ref_sorting_tail variables;

 - OPT_REF_SORT() macro will instantiate ref_sorting instance (which
   may barf and die) and append it to the tail;

 - Append to the tail each ref_sorting read from the configuration
   by parsing in the config callback (which may barf and die);

 - See if ref_sorting is null and use ref_sorting_default() instead.

Now the rule is not all that different but is simpler:

 - Declare ref_sorting_options string list.

 - OPT_REF_SORT() macro will append it to the string list;

 - Append to the string list the sort key read from the
   configuration;

 - call ref_sorting_options() to turn the string list to ref_sorting
   structure (which also deals with the default value).

As side effects, this change also cleans up a few issues:

 - 95be717c (parse_opt_ref_sorting: always use with NONEG flag,
   2019-03-20) muses that "git for-each-ref --no-sort" should simply
   clear the sort keys accumulated so far; it now does.

 - The implementation detail of "struct ref_sorting" and the helper
   function parse_ref_sorting() can now be private to the ref-filter
   API implementation.

 - If you set branch.sort to a bogus value, the any "git branch"
   invocation, not only the listing mode, would abort with the
   original code; now it doesn't

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-20 14:33:07 -07:00
Junio C Hamano 85246a7054 Merge branch 'dd/t6300-wo-gpg-fix'
Test fix.

* dd/t6300-wo-gpg-fix:
  t6300: check for cat-file exit status code
  t6300: don't run cat-file on non-existent object
2021-09-08 13:30:31 -07:00
Đoàn Trần Công Danh 1549577338 t6300: check for cat-file exit status code
In test_atom(), we're piping the output of cat-file to tail(1),
thus, losing its exit status.

Let's use a temporary file to preserve git exit status code.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 11:45:54 -07:00
Đoàn Trần Công Danh 597fa8cb43 t6300: don't run cat-file on non-existent object
In t6300, some tests are guarded behind some prerequisites.
Thus, objects created by those tests ain't available if those
prerequisites are unsatistified.  Attempting to run "cat-file"
on those objects will run into failure.

In fact, running t6300 in an environment without gpg(1),
we'll see those warnings:

	fatal: Not a valid object name refs/tags/signed-empty
	fatal: Not a valid object name refs/tags/signed-short
	fatal: Not a valid object name refs/tags/signed-long

Let's put those commands into the real tests, in order to:

* skip their execution if prerequisites aren't satistified.
* check their exit status code

The expected value for objects with type: commit needs to be
computed outside the test because we can't rely on "$3" there.
Furthermore, to prevent the accidental usage of that computed
expected value, BUG out on unknown object's type.

Signed-off-by: Đoàn Trần Công Danh <congdanhqx@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-25 11:45:53 -07:00
ZheNing Hu b9dee075eb ref-filter: add %(rest) atom
%(rest) is a atom used for cat-file batch mode, which can split
the input lines at the first whitespace boundary, all characters
before that whitespace are considered to be the object name;
characters after that first run of whitespace (i.e., the "rest"
of the line) are output in place of the %(rest) atom.

In order to let "cat-file --batch=%(rest)" use the ref-filter
interface, add %(rest) atom for ref-filter.

Introduce the reject_atom() to reject the atom %(rest) for
"git for-each-ref", "git branch", "git tag" and "git verify-tag".

Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Suggected-by: Jacob Keller <jacob.keller@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26 12:01:26 -07:00
ZheNing Hu 7121c4d4e2 ref-filter: --format=%(raw) support --perl
Because the perl language can handle binary data correctly,
add the function perl_quote_buf_with_len(), which can specify
the length of the data and prevent the data from being truncated
at '\0' to help `--format="%(raw)"` support `--perl`.

Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26 12:01:25 -07:00
ZheNing Hu bd0708c7eb ref-filter: add %(raw) atom
Add new formatting option `%(raw)`, which will print the raw
object data without any changes. It will help further to migrate
all cat-file formatting logic from cat-file to ref-filter.

The raw data of blob, tree objects may contain '\0', but most of
the logic in `ref-filter` depends on the output of the atom being
text (specifically, no embedded NULs in it).

E.g. `quote_formatting()` use `strbuf_addstr()` or `*._quote_buf()`
add the data to the buffer. The raw data of a tree object is
`100644 one\0...`, only the `100644 one` will be added to the buffer,
which is incorrect.

Therefore, we need to find a way to record the length of the
atom_value's member `s`. Although strbuf can already record the
string and its length, if we want to replace the type of atom_value's
member `s` with strbuf, many places in ref-filter that are filled
with dynamically allocated mermory in `v->s` are not easy to replace.
At the same time, we need to check if `v->s == NULL` in
populate_value(), and strbuf cannot easily distinguish NULL and empty
strings, but c-style "const char *" can do it. So add a new member in
`struct atom_value`: `s_size`, which can record raw object size, it
can help us add raw object data to the buffer or compare two buffers
which contain raw object data.

Note that `--format=%(raw)` cannot be used with `--python`, `--shell`,
`--tcl`, and `--perl` because if the binary raw data is passed to a
variable in such languages, these may not support arbitrary binary data
in their string variable type.

Reviewed-by: Jacob Keller <jacob.keller@gmail.com>
Mentored-by: Christian Couder <christian.couder@gmail.com>
Mentored-by: Hariom Verma <hariom18599@gmail.com>
Helped-by: Bagas Sanjaya <bagasdotme@gmail.com>
Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Helped-by: Felipe Contreras <felipe.contreras@gmail.com>
Helped-by: Phillip Wood <phillip.wood@dunelm.org.uk>
Helped-by: Junio C Hamano <gitster@pobox.com>
Based-on-patch-by: Olga Telezhnaya <olyatelezhnaya@gmail.com>
Signed-off-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-26 12:01:25 -07:00
Junio C Hamano 522010b573 Merge branch 'ab/detox-gettext-tests'
Test clean-up.

* ab/detox-gettext-tests:
  tests: remove all uses of test_i18cmp
2021-04-20 17:23:36 -07:00
Junio C Hamano f63add4aa8 Merge branch 'jk/ref-filter-segfault-fix'
A NULL-dereference bug has been corrected in an error codepath in
"git for-each-ref", "git branch --list" etc.

* jk/ref-filter-segfault-fix:
  ref-filter: fix NULL check for parse object failure
2021-04-13 15:28:50 -07:00
Ævar Arnfjörð Bjarmason feeb03bce6 tests: remove all uses of test_i18cmp
Finish the removal I started in 1108cea7f8 (tests: remove most uses
of test_i18ncmp, 2021-02-11). At that time the function wasn't removed
due to disruption with in-flight changes, remove the occurrences that
have landed since then.

As of writing this there are no test_i18ncmp uses between "master" and
"seen", so let's also remove the function to finally put it to rest.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-13 14:41:24 -07:00
Jeff King c685450880 ref-filter: fix NULL check for parse object failure
After we run parse_object_buffer() to get an object's contents, we try
to check that the return value wasn't NULL. However, since our "struct
object" is a pointer-to-pointer, and we assign like:

  *obj = parse_object_buffer(...);

it's not correct to check:

  if (!obj)

That will always be true, since our double pointer will continue to
point to the single pointer (which is itself NULL). This is a regression
that was introduced by aa46a0da30 (ref-filter: use oid_object_info() to
get object, 2018-07-17); since that commit we'll segfault on a parse
failure, as we try to look at the NULL object pointer.

There are many ways a parse could fail, but most of them are hard to set
up in the tests (it's easy to make a bogus object, but update-ref will
refuse to point to it). The test here uses a tag which points to a wrong
object type. A parse of just the broken tag object will succeed, but
seeing both tag objects in the same process will lead to a parse error
(since we'll see the pointed-to object as both types).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-01 12:54:21 -07:00
Hariom Verma ee82a487f6 ref-filter: use pretty.c logic for trailers
Now, ref-filter is using pretty.c logic for setting trailer options.

New to ref-filter:
  :key=<K> - only show trailers with specified key.
  :valueonly[=val] - only show the value part.
  :separator=<SEP> - inserted between trailer lines.
  :key_value_separator=<SEP> - inserted between key and value in trailer lines

Enhancement to existing options(now can take value and its optional):
  :only[=val]
  :unfold[=val]

'val' can be: true, on, yes or false, off, no.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15 16:48:38 -08:00
Hariom Verma 727331dce1 t6300: use function to test trailer options
Add a function to test trailer options. This will make tests look cleaner,
as well as will make it easier to add new tests for trailers in the future.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-15 16:48:38 -08:00
Junio C Hamano 27d7c8599b Merge branch 'js/default-branch-name-tests-final-stretch'
Prepare tests not to be affected by the name of the default branch
"git init" creates.

* js/default-branch-name-tests-final-stretch: (28 commits)
  tests: drop prereq `PREPARE_FOR_MAIN_BRANCH` where no longer needed
  t99*: adjust the references to the default branch name "main"
  tests(git-p4): transition to the default branch name `main`
  t9[5-7]*: adjust the references to the default branch name "main"
  t9[0-4]*: adjust the references to the default branch name "main"
  t8*: adjust the references to the default branch name "main"
  t7[5-9]*: adjust the references to the default branch name "main"
  t7[0-4]*: adjust the references to the default branch name "main"
  t6[4-9]*: adjust the references to the default branch name "main"
  t64*: preemptively adjust alignment to prepare for `master` -> `main`
  t6[0-3]*: adjust the references to the default branch name "main"
  t5[6-9]*: adjust the references to the default branch name "main"
  t55[4-9]*: adjust the references to the default branch name "main"
  t55[23]*: adjust the references to the default branch name "main"
  t551*: adjust the references to the default branch name "main"
  t550*: adjust the references to the default branch name "main"
  t5503: prepare aligned comment for replacing `master` with `main`
  t5[0-4]*: adjust the references to the default branch name "main"
  t5323: prepare centered comment for `master` -> `main`
  t4*: adjust the references to the default branch name "main"
  ...
2021-01-25 14:19:18 -08:00
Johannes Schindelin 8f19c9fd43 t6300: avoid using the default name of the initial branch
Our test suite currently only passes when `git init` uses the name
`master` for the initial branch. This would stop us from changing the
default branch name.

Let's adjust t6300 so that it does not rely on any specific default
branch name. This trick is done by (force-)renaming the initial branch
to the name `main` in the `setup` and the `:remotename and :remoteref`
test cases, and then replacing all mentions of `master` and `MASTER`
with `main` and `MAIN`, respectively.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-07 10:29:25 -08:00
Johannes Schindelin 334afbc76f tests: mark tests relying on the current default for `init.defaultBranch`
In addition to the manual adjustment to let the `linux-gcc` CI job run
the test suite with `master` and then with `main`, this patch makes sure
that GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME is set in all test scripts
that currently rely on the initial branch name being `master by default.

To determine which test scripts to mark up, the first step was to
force-set the default branch name to `master` in

- all test scripts that contain the keyword `master`,

- t4211, which expects `t/t4211/history.export` with a hard-coded ref to
  initialize the default branch,

- t5560 because it sources `t/t556x_common` which uses `master`,

- t8002 and t8012 because both source `t/annotate-tests.sh` which also
  uses `master`)

This trick was performed by this command:

	$ sed -i '/^ *\. \.\/\(test-lib\|lib-\(bash\|cvs\|git-svn\)\|gitweb-lib\)\.sh$/i\
	GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
	' $(git grep -l master t/t[0-9]*.sh) \
	t/t4211*.sh t/t5560*.sh t/t8002*.sh t/t8012*.sh

After that, careful, manual inspection revealed that some of the test
scripts containing the needle `master` do not actually rely on a
specific default branch name: either they mention `master` only in a
comment, or they initialize that branch specificially, or they do not
actually refer to the current default branch. Therefore, the
aforementioned modification was undone in those test scripts thusly:

	$ git checkout HEAD -- \
		t/t0027-auto-crlf.sh t/t0060-path-utils.sh \
		t/t1011-read-tree-sparse-checkout.sh \
		t/t1305-config-include.sh t/t1309-early-config.sh \
		t/t1402-check-ref-format.sh t/t1450-fsck.sh \
		t/t2024-checkout-dwim.sh \
		t/t2106-update-index-assume-unchanged.sh \
		t/t3040-subprojects-basic.sh t/t3301-notes.sh \
		t/t3308-notes-merge.sh t/t3423-rebase-reword.sh \
		t/t3436-rebase-more-options.sh \
		t/t4015-diff-whitespace.sh t/t4257-am-interactive.sh \
		t/t5323-pack-redundant.sh t/t5401-update-hooks.sh \
		t/t5511-refspec.sh t/t5526-fetch-submodules.sh \
		t/t5529-push-errors.sh t/t5530-upload-pack-error.sh \
		t/t5548-push-porcelain.sh \
		t/t5552-skipping-fetch-negotiator.sh \
		t/t5572-pull-submodule.sh t/t5608-clone-2gb.sh \
		t/t5614-clone-submodules-shallow.sh \
		t/t7508-status.sh t/t7606-merge-custom.sh \
		t/t9302-fast-import-unpack-limit.sh

We excluded one set of test scripts in these commands, though: the range
of `git p4` tests. The reason? `git p4` stores the (foreign) remote
branch in the branch called `p4/master`, which is obviously not the
default branch. Manual analysis revealed that only five of these tests
actually require a specific default branch name to pass; They were
modified thusly:

	$ sed -i '/^ *\. \.\/lib-git-p4\.sh$/i\
	GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=master\
	export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME\
	' t/t980[0167]*.sh t/t9811*.sh

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-19 15:44:17 -08:00
Junio C Hamano c25fba986b Merge branch 'hv/ref-filter-misc'
The "--format=" option to the "for-each-ref" command and friends
learned a few more tricks, e.g. the ":short" suffix that applies to
"objectname" now also can be used for "parent", "tree", etc.

* hv/ref-filter-misc:
  ref-filter: add `sanitize` option for 'subject' atom
  pretty: refactor `format_sanitized_subject()`
  ref-filter: add `short` modifier to 'parent' atom
  ref-filter: add `short` modifier to 'tree' atom
  ref-filter: rename `objectname` related functions and fields
  ref-filter: modify error messages in `grab_objectname()`
  ref-filter: refactor `grab_objectname()`
  ref-filter: support different email formats
2020-09-09 13:53:07 -07:00
Junio C Hamano e17723842b Merge branch 'hv/ref-filter-trailers-atom-parsing-fix'
The parser for "git for-each-ref --format=..." was too loose when
parsing the "%(trailers...)" atom, and forgot that "trailers" and
"trailers:<modifiers>" are the only two allowed forms, which has
been corrected.

* hv/ref-filter-trailers-atom-parsing-fix:
  ref-filter: 'contents:trailers' show error if `:` is missing
  t6300: unify %(trailers) and %(contents:trailers) tests
2020-08-31 15:49:47 -07:00
Hariom Verma 905f0a4e64 ref-filter: add `sanitize` option for 'subject' atom
Currently, subject does not take any arguments. This commit introduce
`sanitize` formatting option to 'subject' atom.

`subject:sanitize` - print sanitized subject line, suitable for a filename.

e.g.
%(subject): "the subject line"
%(subject:sanitize): "the-subject-line"

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-28 13:52:51 -07:00
Hariom Verma 26bc0aaf99 ref-filter: add `short` modifier to 'parent' atom
Sometimes while using 'parent' atom, user might want to see abbrev hash
instead of full 40 character hash.

Just like 'objectname', it might be convenient for users to have the
`:short` and `:short=<length>` option for printing 'parent' hash.

Let's introduce `short` option to 'parent' atom.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-28 13:52:50 -07:00
Hariom Verma 837adb102f ref-filter: add `short` modifier to 'tree' atom
Sometimes while using 'tree' atom, user might want to see abbrev hash
instead of full 40 character hash.

Just like 'objectname', it might be convenient for users to have the
`:short` and `:short=<length>` option for printing 'tree' hash.

Let's introduce `short` option to 'tree' atom.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-28 13:52:50 -07:00
Hariom Verma b82445dc27 ref-filter: support different email formats
Currently, ref-filter only supports printing email with angle brackets.

Let's add support for two more email options.
- trim : for email without angle brackets.
- localpart : for the part before the @ sign out of trimmed email

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-28 13:52:50 -07:00
Hariom Verma 2c22e102f8 ref-filter: 'contents:trailers' show error if `:` is missing
The 'contents' atom does not show any error if used with 'trailers'
atom and colon is missing before trailers arguments.

e.g %(contents:trailersonly) works, while it shouldn't.

It is definitely not an expected behavior.

Let's fix this bug.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21 14:46:22 -07:00
Hariom Verma a8e0f50edc t6300: unify %(trailers) and %(contents:trailers) tests
Currently, there are different tests for testing %(trailers) and
%(contents:trailers) causing redundant copy.

Its time to get rid of duplicate code.

Mentored-by: Christian Couder <chriscool@tuxfamily.org>
Mentored-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Hariom Verma <hariom18599@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-21 12:13:26 -07:00
Alban Gruin 3db796c1c0 t6300: fix issues related to %(contents:size)
b6839fda68 (ref-filter: add support for %(contents:size), 2020-07-16)
added a new format for ref-filter, and added a function to generate
tests for this new feature in t6300.  Unfortunately, it tries to run
`test_expect_sucess' instead of `test_expect_success', and writes
$expect to `expected', but tries to read `expect'.  Those two issues
were probably unnoticed because the script only printed errors, but did
not crash.  This fixes these issues.

Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-31 13:26:19 -07:00
Christian Couder b6839fda68 ref-filter: add support for %(contents:size)
It's useful and efficient to be able to get the size of the
contents directly without having to pipe through `wc -c`.

Also the result of the following:

`git for-each-ref --format='%(contents)' refs/heads/my-branch | wc -c`

is off by one as `git for-each-ref` appends a newline character
after the contents, which can be seen by comparing its output
with the output from `git cat-file`.

As with %(contents), %(contents:size) is silently ignored, if a
ref points to something other than a commit or a tag:

```
$ git update-ref refs/mytrees/first HEAD^{tree}
$ git for-each-ref --format='%(contents)' refs/mytrees/first

$ git for-each-ref --format='%(contents:size)' refs/mytrees/first

```

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 10:46:55 -07:00
Christian Couder 6e2ef8eb06 t6300: test refs pointing to tree and blob
Adding tests for refs pointing to tree and blob shows that
we care about testing both positive ("see, my shiny new toy
does work") and negative ("and it won't do nonsensical
things when given an input it is not designed to work with")
cases.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-10 13:15:44 -07:00
Junio C Hamano 6de1630898 Merge branch 'jk/for-each-ref-multi-key-sort-fix'
"git branch" and other "for-each-ref" variants accepted multiple
--sort=<key> options in the increasing order of precedence, but it
had a few breakages around "--ignore-case" handling, and tie-breaking
with the refname, which have been fixed.

* jk/for-each-ref-multi-key-sort-fix:
  ref-filter: apply fallback refname sort only after all user sorts
  ref-filter: apply --ignore-case to all sorting keys
2020-05-08 14:25:04 -07:00
Jeff King 7c5045fc18 ref-filter: apply fallback refname sort only after all user sorts
Commit 9e468334b4 (ref-filter: fallback on alphabetical comparison,
2015-10-30) taught ref-filter's sort to fallback to comparing refnames.
But it did it at the wrong level, overriding the comparison result for a
single "--sort" key from the user, rather than after all sort keys have
been exhausted.

This worked correctly for a single "--sort" option, but not for multiple
ones. We'd break any ties in the first key with the refname and never
evaluate the second key at all.

To make matters even more interesting, we only applied this fallback
sometimes! For a field like "taggeremail" which requires a string
comparison, we'd truly return the result of strcmp(), even if it was 0.
But for numerical "value" fields like "taggerdate", we did apply the
fallback. And that's why our multiple-sort test missed this: it uses
taggeremail as the main comparison.

So let's start by adding a much more rigorous test. We'll have a set of
commits expressing every combination of two tagger emails, dates, and
refnames. Then we can confirm that our sort is applied with the correct
precedence, and we'll be hitting both the string and value comparators.

That does show the bug, and the fix is simple: moving the fallback to
the outer compare_refs() function, after all ref_sorting keys have been
exhausted.

Note that in the outer function we don't have an "ignore_case" flag, as
it's part of each individual ref_sorting element. It's debatable what
such a fallback should do, since we didn't use the user's keys to match.
But until now we have been trying to respect that flag, so the
least-invasive thing is to try to continue to do so. Since all callers
in the current code either set the flag for all keys or for none, we can
just pull the flag from the first key. In a hypothetical world where the
user really can flip the case-insensitivity of keys separately, we may
want to extend the code to distinguish that case from a blanket
"--ignore-case".

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-04 13:44:46 -07:00
Jeff King 76f9e569ad ref-filter: apply --ignore-case to all sorting keys
All of the ref-filter users (for-each-ref, branch, and tag) take an
--ignore-case option which makes filtering and sorting case-insensitive.
However, this option was applied only to the first element of the
ref_sorting list. So:

  git for-each-ref --ignore-case --sort=refname

would do what you expect, but:

  git for-each-ref --ignore-case --sort=refname --sort=taggername

would sort the primary key (taggername) case-insensitively, but sort the
refname case-sensitively. We have two options here:

  - teach callers to set ignore_case on the whole list

  - replace the ref_sorting list with a struct that contains both the
    list of sorting keys, as well as options that apply to _all_
    keys

I went with the first one here, as it gives more flexibility if we later
want to let the users set the flag per-key (presumably through some
special syntax when defining the key; for now it's all or nothing
through --ignore-case).

The new test covers this by sorting on both tagger and subject
case-insensitively, which should compare "a" and "A" identically, but
still sort them before "b" and "B". We'll break ties by sorting on the
refname to give ourselves a stable output (this is actually supposed to
be done automatically, but there's another bug which will be fixed in
the next commit).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-05-04 13:41:20 -07:00
brian m. carlson 8bd5a2906e t6300: make hash algorithm independent
One of the git for-each-ref tests asks to sort by object ID.  However,
when sorted, the order of the refs differs between SHA-1 and SHA-256.
Sort the expected output so that the test passes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:24 -08:00
brian m. carlson 1f5f8f3e85 t6300: abstract away SHA-1-specific constants
Adjust the test so that it computes variables for object IDs instead of
using hard-coded hashes.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:24 -08:00