1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-13 06:26:09 +02:00
Commit Graph

35 Commits

Author SHA1 Message Date
Patrick Steinhardt b0c42a53c9 list-objects: implement object type filter
While it already is possible to filter objects by some criteria in
git-rev-list(1), it is not yet possible to filter out only a specific
type of objects. This makes some filters less useful. The `blob:limit`
filter for example filters blobs such that only those which are smaller
than the given limit are returned. But it is unfit to ask only for these
smallish blobs, given that git-rev-list(1) will continue to print tags,
commits and trees.

Now that we have the infrastructure in place to also filter tags and
commits, we can improve this situation by implementing a new filter
which selects objects based on their type. Above query can thus
trivially be implemented with the following command:

    $ git rev-list --objects --filter=object:type=blob \
        --filter=blob:limit=200

Furthermore, this filter allows to optimize for certain other cases: if
for example only tags or commits have been selected, there is no need to
walk down trees.

The new filter is not yet supported in bitmaps. This is going to be
implemented in a subsequent commit.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-04-19 14:09:11 -07:00
Martin Ågren 5a923bb1f0 list-objects-filter-options: fix function name in BUG
Fix the function name we give in the BUG message. It's "config", not
"choice".

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-16 14:28:25 -08:00
Jonathan Tan 23547c4051 fetch: do not override partial clone filter
When a fetch with the --filter argument is made, the configured default
filter is set even if one already exists. This change was made in
5e46139376 ("builtin/fetch: remove unique promisor remote limitation",
2019-06-25) - in particular, changing from:

 * If this is the FIRST partial-fetch request, we enable partial
 * on this repo and remember the given filter-spec as the default
 * for subsequent fetches to this remote.

to:

 * If this is a partial-fetch request, we enable partial on
 * this repo if not already enabled and remember the given
 * filter-spec as the default for subsequent fetches to this
 * remote.

(The given filter-spec is "remembered" even if there is already an
existing one.)

This is problematic whenever a lazy fetch is made, because lazy fetches
are made using "git fetch --filter=blob:none", but this will also happen
if the user invokes "git fetch --filter=<filter>" manually. Therefore,
restore the behavior prior to 5e46139376, which writes a filter-spec
only if the current fetch request is the first partial-fetch one (for
that remote).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-28 16:11:59 -07:00
Junio C Hamano 73a9255166 Merge branch 'tb/upload-pack-filters'
The component to respond to "git fetch" request is made more
configurable to selectively allow or reject object filtering
specification used for partial cloning.

* tb/upload-pack-filters:
  t5616: use test_i18ngrep for upload-pack errors
  upload-pack.c: introduce 'uploadpackfilter.tree.maxDepth'
  upload-pack.c: allow banning certain object filter(s)
  list_objects_filter_options: introduce 'list_object_filter_config_name'
2020-08-11 18:04:13 -07:00
Taylor Blau b9ea214795 list_objects_filter_options: introduce 'list_object_filter_config_name'
In a subsequent commit, we will add configuration options that are
specific to each kind of object filter, in which case it is handy to
have a function that translates between 'enum
list_objects_filter_choice' and an appropriate configuration-friendly
string.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-03 18:03:24 -07:00
Jeff King dbbcd44fb4 strvec: rename files from argv-array to strvec
This requires updating #include lines across the code-base, but that's
all fairly mechanical, and was done with:

  git ls-files '*.c' '*.h' |
  xargs perl -i -pe 's/argv-array.h/strvec.h/'

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 15:02:17 -07:00
Xin Li 16af5f1abb repository: add a helper function to perform repository format upgrade
In version 1 of repository format, "extensions" gained special meaning
and it is safer to avoid upgrading when there are pre-existing
extensions.

Make list-objects-filter to use the helper function instead of setting
repository version directly as a prerequisite of exposing the upgrade
capability.

Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 10:13:30 -07:00
Junio C Hamano ad8f0368b4 Merge branch 'jk/partial-clone-sparse-blob'
The name of the blob object that stores the filter specification
for sparse cloning/fetching was interpreted in a wrong place in the
code, causing Git to abort.

* jk/partial-clone-sparse-blob:
  list-objects-filter: use empty string instead of NULL for sparse "base"
  list-objects-filter: give a more specific error sparse parsing error
  list-objects-filter: delay parsing of sparse oid
  t5616: test cloning/fetching with sparse:oid=<oid> filter
2019-10-07 11:32:54 +09:00
Junio C Hamano 627b826834 Merge branch 'md/list-objects-filter-combo'
The list-objects-filter API (used to create a sparse/lazy clone)
learned to take a combined filter specification.

* md/list-objects-filter-combo:
  list-objects-filter-options: make parser void
  list-objects-filter-options: clean up use of ALLOC_GROW
  list-objects-filter-options: allow mult. --filter
  strbuf: give URL-encoding API a char predicate fn
  list-objects-filter-options: make filter_spec a string_list
  list-objects-filter-options: move error check up
  list-objects-filter: implement composite filters
  list-objects-filter-options: always supply *errbuf
  list-objects-filter: put omits set in filter struct
  list-objects-filter: encapsulate filter components
2019-09-18 11:50:09 -07:00
Jeff King 4c96a77594 list-objects-filter: delay parsing of sparse oid
The list-objects-filter code has two steps to its initialization:

  1. parse_list_objects_filter() makes sure the spec is a filter we know
     about and is syntactically correct. This step is done by "rev-list"
     or "upload-pack" that is going to apply a filter, but also by "git
     clone" or "git fetch" before they send the spec across the wire.

  2. list_objects_filter__init() runs the type-specific initialization
     (using function pointers established in step 1). This happens at
     the start of traverse_commit_list_filtered(), when we're about to
     actually use the filter.

It's a good idea to parse as much as we can in step 1, in order to catch
problems early (e.g., a blob size limit that isn't a number). But one
thing we _shouldn't_ do is resolve any oids at that step (e.g., for
sparse-file contents specified by oid). In the case of a fetch, the oid
has to be resolved on the remote side.

The current code does resolve the oid during the parse phase, but
ignores any error (which we must do, because we might just be sending
the spec across the wire). This leads to two bugs:

  - if we're not in a repository (e.g., because it's git-clone parsing
    the spec), then we trigger a BUG() trying to resolve the name

  - if we did hit the error case, we still have to notice that later and
    bail. The code path in rev-list handles this, but the one in
    upload-pack does not, leading to a segfault.

We can fix both by moving the oid resolution into the sparse-oid init
function. At that point we know we have a repository (because we're
about to traverse), and handling the error there fixes the segfault.

As a bonus, we can drop the NULL sparse_oid_value check in rev-list,
since this is now handled in the sparse-oid-filter init function.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-16 12:47:37 -07:00
Matthew DeVore 90d21f9ebf list-objects-filter-options: make parser void
This function always returns 0, so make it return void instead.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Matthew DeVore 5a133e8a7f list-objects-filter-options: clean up use of ALLOC_GROW
Introduce a new macro ALLOC_GROW_BY which automatically zeros the added
array elements and takes care of updating the nr value. Use the macro in
code introduced earlier in this patchset.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Matthew DeVore 489fc9ee71 list-objects-filter-options: allow mult. --filter
Allow combining of multiple filters by simply repeating the --filter
flag. Before this patch, the user had to combine them in a single flag
somewhat awkwardly (e.g. --filter=combine:FOO+BAR), including
URL-encoding the individual filters.

To make this work, in the --filter flag parsing callback, rather than
error out when we detect that the filter_options struct is already
populated, we modify it in-place to contain the added sub-filter. The
existing sub-filter becomes the lhs of the combined filter, and the
next sub-filter becomes the rhs. We also have to URL-encode the LHS and
RHS sub-filters.

We can simplify the operation if the LHS is already a combine: filter.
In that case, we just append the URL-encoded RHS sub-filter to the LHS
spec to get the new spec.

Helped-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Jeff Hostetler <git@jeffhostetler.com>
Helped-by: Jeff King <peff@peff.net>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Matthew DeVore cf9ceb5a12 list-objects-filter-options: make filter_spec a string_list
Make the filter_spec string a string_list rather than a raw C string.
The list of strings must be concatted together to make a complete
filter_spec. A future patch will use this capability to build "combine:"
filter specs gradually.

A strbuf would seem to be a more natural choice for this object, but it
unfortunately requires initialization besides just zero'ing out the
memory.  This results in all container structs, and all containers of
those structs, etc., to also require initialization. Initializing them
all would be more cumbersome that simply using a string_list, which
behaves properly when its contents are zero'd.

For the purposes of code simplification, change behavior in how filter
specs are conveyed over the protocol: do not normalize the tree:<depth>
filter specs since there should be no server in existence that supports
tree:# but not tree:#k etc.

Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Matthew DeVore f56f764279 list-objects-filter-options: move error check up
Move the check that filter_options->choice is set to higher in the call
stack. This can only be set when the gentle parse function is called
from one of the two call sites.

This is important because in an upcoming patch this may or may not be an
error, and whether it is an error is only known to the
parse_list_objects_filter function.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Matthew DeVore e987df5fe6 list-objects-filter: implement composite filters
Allow combining filters such that only objects accepted by all filters
are shown. The motivation for this is to allow getting directory
listings without also fetching blobs. This can be done by combining
blob:none with tree:<depth>. There are massive repositories that have
larger-than-expected trees - even if you include only a single commit.

A combined filter supports any number of subfilters, and is written in
the following form:

	combine:<filter 1>+<filter 2>+<filter 3>

Certain non-alphanumeric characters in each filter must be
URL-encoded.

For now, combined filters must be specified in this form. In a
subsequent commit, rev-list will support multiple --filter arguments
which will have the same effect as specifying one filter argument
starting with "combine:". The documentation will be updated in that
commit, as the URL-encoding scheme is in general not meant to be used
directly by the user, and it is better to describe the URL-encoding
feature in terms of the repeated flag.

Helped-by: Emily Shaffer <emilyshaffer@google.com>
Helped-by: Jeff Hostetler <git@jeffhostetler.com>
Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
Helped-by: Jonathan Tan <jonathantanmy@google.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Matthew DeVore 842b00516a list-objects-filter-options: always supply *errbuf
Making errbuf an optional argument complicates error reporting. Fix this
by making all callers supply an errbuf, even if they may ignore it. This
will be important in follow-up patches where the filter-spec parsing has
more pitfalls and possible errors.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-28 08:41:53 -07:00
Christian Couder fa3d1b63e8 promisor-remote: parse remote.*.partialclonefilter
This makes it possible to specify a different partial clone
filter for each promisor remote.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25 14:05:37 -07:00
Christian Couder b14ed5adaf Use promisor_remote_get_direct() and has_promisor_remote()
Instead of using the repository_format_partial_clone global
and fetch_objects() directly, let's use has_promisor_remote()
and promisor_remote_get_direct().

This way all the configured promisor remotes will be taken
into account, not only the one specified by
extensions.partialClone.

Also when cloning or fetching using a partial clone filter,
remote.origin.promisor will be set to "true" instead of
setting extensions.partialClone to "origin". This makes it
possible to use many promisor remote just by fetching from
them.

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-25 14:05:37 -07:00
Junio C Hamano ca02d3669f Merge branch 'md/list-objects-filter-parse-msgfix'
Make an end-user facing message localizable.

* md/list-objects-filter-parse-msgfix:
  list-objects-filter-options: error is localizeable
2019-06-21 11:24:10 -07:00
Matthew DeVore 5c03bc8b1d list-objects-filter-options: error is localizeable
The "invalid filter-spec" message is user-facing and not a BUG, so make
it localizeable.

For reference, the message appears in this context:

	$ git rev-list --filter=blob:nonse --objects HEAD
	fatal: invalid filter-spec 'blob:nonse'

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-31 11:30:10 -07:00
Christian Couder e693237e2b list-objects-filter: disable 'sparse:path' filters
If someone wants to use as a filter a sparse file that is in the
repository, something like "--filter=sparse:oid=<ref>:<path>"
already works.

So 'sparse:path' is only interesting if the sparse file is not in
the repository. In this case though the current implementation has
a big security issue, as it makes it possible to ask the server to
read any file, like for example /etc/password, and to explore the
filesystem, as well as individual lines of files.

If someone is interested in using a sparse file that is not in the
repository as a filter, then at the minimum a config option, such
as "uploadpack.sparsePathFilter", should be implemented first to
restrict the directory from which the files specified by
'sparse:path' can be read.

For now though, let's just disable 'sparse:path' filters.

Helped-by: Matthew DeVore <matvore@google.com>
Helped-by: Jeff Hostetler <git@jeffhostetler.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-29 11:05:34 -07:00
Nguyễn Thái Ngọc Duy 5a59a2301f completion: add more parameter value completion
This adds value completion for a couple more paramters. To make it
easier to maintain these hard coded lists, add a comment at the original
list/code to remind people to update git-completion.bash too.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-20 12:31:56 -08:00
Junio C Hamano 7589e63648 Merge branch 'nd/the-index-final'
The assumption to work on the single "in-core index" instance has
been reduced from the library-ish part of the codebase.

* nd/the-index-final:
  cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
  read-cache.c: remove the_* from index_has_changes()
  merge-recursive.c: remove implicit dependency on the_repository
  merge-recursive.c: remove implicit dependency on the_index
  sha1-name.c: remove implicit dependency on the_index
  read-cache.c: replace update_index_if_able with repo_&
  read-cache.c: kill read_index()
  checkout: avoid the_index when possible
  repository.c: replace hold_locked_index() with repo_hold_locked_index()
  notes-utils.c: remove the_repository references
  grep: use grep_opt->repo instead of explict repo argument
2019-02-06 22:05:23 -08:00
Josh Steadmon 87c2d9d310 filter-options: expand scaled numbers
When communicating with a remote server or a subprocess, use
expanded numbers rather than numbers with scaling suffix in the
object filter spec (e.g.  "limit:blob=1k" becomes
"limit:blob=1024").

Update the protocol docs to note that clients should always perform this
expansion, to allow for more compatibility between server
implementations.

Signed-off-by: Josh Steadmon <steadmon@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 15:42:31 -08:00
Matthew DeVore c813a7c35f list-objects-filter: teach tree:# how to handle >0
Implement positive values for <depth> in the tree:<depth> filter. The
exact semantics are described in Documentation/rev-list-options.txt.

The long-term goal at the end of this is to allow a partial clone to
eagerly fetch an entire directory of files by fetching a tree and
specifying <depth>=1. This, for instance, would make a build operation
fast and convenient. It is fast because the partial clone does not need
to fetch each file individually, and convenient because the user does
not need to supply a sparse-checkout specification.

Another way of considering this feature is as a way to reduce
round-trips, since the client can get any number of levels of
directories in a single request, rather than wait for each level of tree
objects to come back, whose entries are used to construct a new request.

Signed-off-by: Matthew DeVore <matvore@google.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 15:39:34 -08:00
Nguyễn Thái Ngọc Duy 3a7a698e93 sha1-name.c: remove implicit dependency on the_index
This kills the_index dependency in get_oid_with_context() but for
get_oid() and friends, they still assume the_repository (which also
means the_index).

Unfortunately the widespread use of get_oid() will make it hard to
make the conversion now. We probably will add repo_get_oid() at some
point and limit the use of get_oid() in builtin/ instead of forcing
all get_oid() call sites to carry struct repository.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-14 12:13:04 -08:00
Matthew DeVore bc5975d24f list-objects-filter: implement filter tree:0
Teach list-objects the "tree:0" filter which allows for filtering
out all tree and blob objects (unless other objects are explicitly
specified by the user). The purpose of this patch is to allow smaller
partial clones.

The name of this filter - tree:0 - does not explicitly specify that
it also filters out all blobs, but this should not cause much confusion
because blobs are not at all useful without the trees that refer to
them.

I also considered only:commits as a name, but this is inaccurate because
it suggests that annotated tags are omitted, but actually they are
included.

The name "tree:0" allows later filtering based on depth, i.e. "tree:1"
would filter out all but the root tree and blobs. In order to avoid
confusion between 0 and capital O, the documentation was worded in a
somewhat round-about way that also hints at this future improvement to
the feature.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07 08:55:00 +09:00
Matthew DeVore cc0b05a4cc list-objects-filter-options: do not over-strbuf_init
The function gently_parse_list_objects_filter is either called with
errbuf=STRBUF_INIT or errbuf=NULL, but that function calls strbuf_init
when errbuf is not NULL. strbuf_init is only necessary if errbuf
contains garbage, and risks a memory leak if errbuf already has a
non-STRBUF_INIT state. It should be the caller's responsibility to make
sure errbuf is not garbage, since garbage content is easily avoidable
with STRBUF_INIT.

Signed-off-by: Matthew DeVore <matvore@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-07 08:55:00 +09:00
Jonathan Tan cac1137dc4 list-objects: check if filter is NULL before using
In partial_clone_get_default_filter_spec(), the
core_partial_clone_filter_default variable may be NULL; ensure that it
is not NULL before using it.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-12 10:46:56 -07:00
Jeff Hostetler aa57b871da fetch: inherit filter-spec from partial clone
Teach (partial) fetch to inherit the filter-spec used by
the partial clone.  Extend --no-filter to override this
inheritance.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:58:52 -08:00
Jeff Hostetler 1e1e39b308 partial-clone: define partial clone settings in config
Create get and set routines for "partial clone" config settings.
These will be used in a future commit by clone and fetch to
remember the promisor remote and the default filter-spec.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:58:51 -08:00
Jeff Hostetler 4875c9791e list-objects-filter-options: support --no-filter
Teach opt_parse_list_objects_filter() to take --no-filter
option and to free the contents of struct filter_options.
This command line argument will be automatically inherited
by commands using OPT_PARSE_LIST_OBJECTS_FILTER(); this
includes pack-objects.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:44:36 -08:00
Christian Couder 1dde5fa2b2 list-objects-filter-options: fix 'keword' typo in comment
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 09:44:35 -08:00
Jeff Hostetler 25ec7bcac0 list-objects: filter objects in traverse_commit_list
Create traverse_commit_list_filtered() and add filtering
interface to allow certain objects to be omitted from the
traversal.

Update traverse_commit_list() to be a wrapper for the above
with a null filter to minimize the number of callers that
needed to be changed.

Object filtering will be used in a future commit by rev-list
and pack-objects for partial clone and fetch to omit unwanted
objects from the result.

traverse_bitmap_commit_list() does not work with filtering.
If a packfile bitmap is present, it will not be used.  It
should be possible to extend such support in the future (at
least to simple filters that do not require object pathnames),
but that is beyond the scope of this patch series.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Reviewed-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-11-22 14:11:57 +09:00