1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-19 02:26:08 +02:00
Commit Graph

1979 Commits

Author SHA1 Message Date
Derrick Stolee 58300f4743 sparse-index: add index.sparse config option
When enabled, this config option signals that index writes should
attempt to use sparse-directory entries.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:57:47 -07:00
Derrick Stolee 6e773527b6 sparse-index: convert from full to sparse
If we have a full index, then we can convert it to a sparse index by
replacing directories outside of the sparse cone with sparse directory
entries. The convert_to_sparse() method does this, when the situation is
appropriate.

For now, we avoid converting the index to a sparse index if:

 1. the index is split.
 2. the index is already sparse.
 3. sparse-checkout is disabled.
 4. sparse-checkout does not use cone mode.

Finally, we currently limit the conversion to when the
GIT_TEST_SPARSE_INDEX environment variable is enabled. A mode using Git
config will be added in a later change.

The trickiest thing about this conversion is that we might not be able
to mark a directory as a sparse directory just because it is outside the
sparse cone. There might be unmerged files within that directory, so we
need to look for those. Also, if there is some strange reason why a file
is not marked with CE_SKIP_WORKTREE, then we should give up on
converting that directory. There is still hope that some of its
subdirectories might be able to convert to sparse, so we keep looking
deeper.

The conversion process is assisted by the cache-tree extension. This is
calculated from the full index if it does not already exist. We then
abandon the cache-tree as it no longer applies to the newly-sparse
index. Thus, this cache-tree will be recalculated in every
sparse-full-sparse round-trip until we integrate the cache-tree
extension with the sparse index.

Some Git commands use the index after writing it. For example, 'git add'
will update the index, then write it to disk, then read its entries to
report information. To keep the in-memory index in a full state after
writing, we re-expand it to a full one after the write. This is wasteful
for commands that only write the index and do not read from it again,
but that is only the case until we make those commands "sparse aware."

We can compare the behavior of the sparse-index in
t1092-sparse-checkout-compability.sh by using GIT_TEST_SPARSE_INDEX=1
when operating on the 'sparse-index' repo. We can also compare the two
sparse repos directly, such as comparing their indexes (when expanded to
full in the case of the 'sparse-index' repo). We also verify that the
index is actually populated with sparse directory entries.

The 'checkout and reset (mixed)' test is marked for failure when
comparing a sparse repo to a full repo, but we can compare the two
sparse-checkout cases directly to ensure that we are not changing the
behavior when using a sparse index.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:57:47 -07:00
Derrick Stolee 836e25c51b sparse-checkout: hold pattern list in index
As we modify the sparse-checkout definition, we perform index operations
on a pattern_list that only exists in-memory. This allows easy backing
out in case the index update fails.

However, if the index write itself cares about the sparse-checkout
pattern set, we need access to that in-memory copy. Place a pointer to
a 'struct pattern_list' in the index so we can access this on-demand.
This will be used in the next change which uses the sparse-checkout
definition to filter out directories that are outside the sparse cone.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:57:46 -07:00
Derrick Stolee 4300f8442a sparse-index: implement ensure_full_index()
We will mark an in-memory index_state as having sparse directory entries
with the sparse_index bit. These currently cannot exist, but we will add
a mechanism for collapsing a full index to a sparse one in a later
change. That will happen at write time, so we must first allow parsing
the format before writing it.

Commands or methods that require a full index in order to operate can
call ensure_full_index() to expand that index in-memory. This requires
parsing trees using that index's repository.

Sparse directory entries have a specific 'ce_mode' value. The macro
S_ISSPARSEDIR(ce->ce_mode) can check if a cache_entry 'ce' has this type.
This ce_mode is not possible with the existing index formats, so we don't
also verify all properties of a sparse-directory entry, which are:

 1. ce->ce_mode == 0040000
 2. ce->flags & CE_SKIP_WORKTREE is true
 3. ce->name[ce->namelen - 1] == '/' (ends in dir separator)
 4. ce->oid references a tree object.

These are all semi-enforced in ensure_full_index() to some extent. Any
deviation will cause a warning at minimum or a failure in the worst
case.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-30 12:57:45 -07:00
Ævar Arnfjörð Bjarmason eefadd18e1 tree.c API: move read_tree() into builtin/ls-files.c
Since the read_tree() API was added around the same time as
read_tree_recursive() in 94537c78a8 (Move "read_tree()" to
"tree.c"[...], 2005-04-22) and b12ec373b8 ([PATCH] Teach read-tree
about commit objects, 2005-04-20) things have gradually migrated over
to the read_tree_recursive() version.

Now builtin/ls-files.c is the last user of this code, let's move all
the relevant code there. This allows for subsequent simplification of
it, and an eventual move to read_tree_recursive().

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-03-20 16:09:25 -07:00
Junio C Hamano 2f794620f5 Merge branch 'ds/more-index-cleanups'
Cleaning various codepaths up.

* ds/more-index-cleanups:
  t1092: test interesting sparse-checkout scenarios
  test-lib: test_region looks for trace2 regions
  sparse-checkout: load sparse-checkout patterns
  name-hash: use trace2 regions for init
  repository: add repo reference to index_state
  fsmonitor: de-duplicate BUG()s around dirty bits
  cache-tree: extract subtree_pos()
  cache-tree: simplify verify_cache() prototype
  cache-tree: clean up cache_tree_update()
2021-02-10 14:48:33 -08:00
Junio C Hamano 294e949fa2 Merge branch 'ps/config-env-pairs'
Introduce two new ways to feed configuration variable-value pairs
via environment variables, and tweak the way GIT_CONFIG_PARAMETERS
encodes variable/value pairs to make it more robust.

* ps/config-env-pairs:
  config: allow specifying config entries via envvar pairs
  environment: make `getenv_safe()` a public function
  config: store "git -c" variables using more robust format
  config: parse more robust format in GIT_CONFIG_PARAMETERS
  config: extract function to parse config pairs
  quote: make sq_dequote_step() a public function
  config: add new way to pass config via `--config-env`
  git: add `--super-prefix` to usage string
2021-01-25 14:19:19 -08:00
Derrick Stolee 1fd9ae517c repository: add repo reference to index_state
It will be helpful to add behavior to index operations that might
trigger an object lookup. Since each index belongs to a specific
repository, add a 'repo' pointer to struct index_state that allows
access to this repository.

Add a BUG() statement if the repo already has an index, and the index
already has a repo, but somehow the index points to a different repo.

This will prevent future changes from needing to pass an additional
'struct repository *repo' parameter and instead rely only on the 'struct
index_state *istate' parameter.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-23 17:14:07 -08:00
Junio C Hamano 9ba366f12b Merge branch 'bc/rev-parse-path-format'
"git rev-parse" can be explicitly told to give output as absolute
or relative path with the `--path-format=(absolute|relative)` option.

* bc/rev-parse-path-format:
  rev-parse: add option for absolute or relative path formatting
  abspath: add a function to resolve paths with missing components
2021-01-15 15:20:28 -08:00
Patrick Steinhardt d8d77153ea config: allow specifying config entries via envvar pairs
While we currently have the `GIT_CONFIG_PARAMETERS` environment variable
which can be used to pass runtime configuration data to git processes,
it's an internal implementation detail and not supposed to be used by
end users.

Next to being for internal use only, this way of passing config entries
has a major downside: the config keys need to be parsed as they contain
both key and value in a single variable. As such, it is left to the user
to escape any potentially harmful characters in the value, which is
quite hard to do if values are controlled by a third party.

This commit thus adds a new way of adding config entries via the
environment which gets rid of this shortcoming. If the user passes the
`GIT_CONFIG_COUNT=$n` environment variable, Git will parse environment
variable pairs `GIT_CONFIG_KEY_$i` and `GIT_CONFIG_VALUE_$i` for each
`i` in `[0,n)`.

While the same can be achieved with `git -c <name>=<value>`, one may
wish to not do so for potentially sensitive information. E.g. if one
wants to set `http.extraHeader` to contain an authentication token,
doing so via `-c` would trivially leak those credentials via e.g. ps(1),
which typically also shows command arguments.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-01-15 13:03:45 -08:00
Junio C Hamano e5ace7167a Merge branch 'jk/oid-array-cleanup'
Code clean-up.

* jk/oid-array-cleanup:
  commit-graph: use size_t for array allocation and indexing
  commit-graph: replace packed_oid_list with oid_array
  commit-graph: drop count_distinct_commits() function
  oid-array: provide a for-loop iterator
  oid-array: make sort function public
  cache.h: move hash/oid functions to hash.h
  t0064: make duplicate tests more robust
  t0064: drop sha1 mention from filename
  oid-array.h: drop sha1 mention from header guard
2020-12-17 15:06:40 -08:00
brian m. carlson be6e0daee7 abspath: add a function to resolve paths with missing components
Currently, we have a function to resolve paths, strbuf_realpath.  This
function canonicalizes paths like realpath(3), but permits a trailing
component to be absent from the file system.  In other words, this is
the behavior of the GNU realpath(1) without any arguments.

In the future, we'll need this same behavior, except that we want to
allow for any number of missing trailing components, which is the
behavior of GNU realpath(1) with the -m option.  This is useful because
we'll want to canonicalize a path that may point to a not yet present
path under the .git directory.  For example, a user may want to know
where an arbitrary ref would be stored if it existed in the file system.

Let's refactor strbuf_realpath to move most of the code to an internal
function and then pass it two flags to control its behavior.  We'll add
a strbuf_realpath_forgiving function that has our new behavior, and
leave strbuf_realpath with the older, stricter behavior.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-12 23:35:47 -08:00
Junio C Hamano 9b3b4adb3f Merge branch 'mt/do-not-use-scld-in-working-tree'
"git apply" adjusted the permission bits of working-tree files and
directories according core.sharedRepository setting by mistake and
for a long time, which has been corrected.

* mt/do-not-use-scld-in-working-tree:
  apply: don't use core.sharedRepository to create working tree files
2020-12-08 15:11:20 -08:00
Jeff King 3fa6f2aa57 cache.h: move hash/oid functions to hash.h
We define git_hash_algo and object_id in hash.h, but most of the utility
functions are declared in the main cache.h. Let's move them to hash.h
along with their struct definitions. This cleans up cache.h a bit, but
also avoids circular dependencies when other headers need to know about
these functions (e.g., if oid-array.h were to have an inline that used
oideq(), it couldn't include cache.h because it is itself included by
cache.h).

No including C files should be affected, because hash.h is always
included in cache.h already.

We do have to mention repository.h at the top of hash.h, though, since
we depend on the_repository in some of our inline functions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-04 13:55:14 -08:00
Matheus Tavares eb3c027e17 apply: don't use core.sharedRepository to create working tree files
core.sharedRepository defines which permissions Git should set when
creating files in $GIT_DIR, so that the repository may be shared with
other users. But (in its current form) the setting shouldn't affect how
files are created in the working tree. This is not respected by apply
and am (which uses apply), when creating leading directories:

$ cat d.patch
 diff --git a/d/f b/d/f
 new file mode 100644
 index 0000000..e69de29

Apply without the setting:
$ umask 0077
$ git apply d.patch
$ ls -ld d
 drwx------

Apply with the setting:
$ umask 0077
$ git -c core.sharedRepository=0770 apply d.patch
$ ls -ld d
 drwxrws---

Only the leading directories are affected. That's because they are
created with safe_create_leading_directories(), which calls
adjust_shared_perm() to set the directories' permissions based on
core.sharedRepository. To fix that, let's introduce a variant of this
function that ignores the setting, and use it in apply. Also add a
regression test and a note in the function documentation about the use
of each variant according to the destination (working tree or git
dir).

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-12-02 14:35:51 -08:00
Han-Wen Nienhuys a76b138daa move sleep_millisec to git-compat-util.h
The sleep function is defined in wrapper.c, so it makes more sense to be a in
system compatibility header.

Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-24 17:40:16 -08:00
brian m. carlson 47ac970309 builtin/clone: avoid failure with GIT_DEFAULT_HASH
If a user is cloning a SHA-1 repository with GIT_DEFAULT_HASH set to
"sha256", then we can end up with a repository where the repository
format version is 0 but the extensions.objectformat key is set to
"sha256".  This is both wrong (the user has a SHA-1 repository) and
nonfunctional (because the extension cannot be used in a v0 repository).

This happens because in a clone, we initially set up the repository, and
then change its algorithm based on what the remote side tells us it's
using.  We've initially set up the repository as SHA-256 in this case,
and then later on reset the repository version without clearing the
extension.

We could just always set the extension in this case, but that would mean
that our SHA-1 repositories weren't compatible with older Git versions,
even though there's no reason why they shouldn't be.  And we also don't
want to initialize the repository as SHA-1 initially, since that means
if we're cloning an empty repository, we'll have failed to honor the
GIT_DEFAULT_HASH variable and will end up with a SHA-1 repository, not a
SHA-256 repository.

Neither of those are appealing, so let's tell the repository
initialization code if we're doing a reinit like this, and if so, to
clear the extension if we're using SHA-1.  This makes sure we produce a
valid and functional repository and doesn't break any of our other use
cases.

Reported-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-22 09:22:32 -07:00
Junio C Hamano 0df670bc0b Merge branch 'jt/interpret-branch-name-fallback'
"git status" has trouble showing where it came from by interpreting
reflog entries that recordcertain events, e.g. "checkout @{u}", and
gives a hard/fatal error.  Even though it inherently is impossible
to give a correct answer because the reflog entries lose some
information (e.g. "@{u}" does not record what branch the user was
on hence which branch 'the upstream' needs to be computed, and even
if the record were available, the relationship between branches may
have changed), at least hide the error to allow "status" show its
output.

* jt/interpret-branch-name-fallback:
  wt-status: tolerate dangling marks
  refs: move dwim_ref() to header file
  sha1-name: replace unsigned int with option struct
2020-09-09 13:53:09 -07:00
Jonathan Tan f24c30e0b6 wt-status: tolerate dangling marks
When a user checks out the upstream branch of HEAD, the upstream branch
not being a local branch, and then runs "git status", like this:

  git clone $URL client
  cd client
  git checkout @{u}
  git status

no status is printed, but instead an error message:

  fatal: HEAD does not point to a branch

(This error message when running "git branch" persists even after
checking out other things - it only stops after checking out a branch.)

This is because "git status" reads the reflog when determining the "HEAD
detached" message, and thus attempts to DWIM "@{u}", but that doesn't
work because HEAD no longer points to a branch.

Therefore, when calculating the status of a worktree, tolerate dangling
marks. This is done by adding an additional parameter to
dwim_ref() and repo_dwim_ref().

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-02 14:39:25 -07:00
Jonathan Tan a4f66a7876 sha1-name: replace unsigned int with option struct
In preparation for a future patch adding a boolean parameter to
repo_interpret_branch_name(), which might be easily confused with an
existing unsigned int parameter, refactor repo_interpret_branch_name()
to take an option struct instead of the unsigned int parameter.

The static function interpret_branch_mark() is also updated to take the
option struct in preparation for that future patch, since it will also
make use of the to-be-introduced boolean parameter.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-02 14:39:17 -07:00
Junio C Hamano 0d9a8e33f9 Merge branch 'jk/leakfix'
Code clean-up.

* jk/leakfix:
  submodule--helper: fix leak of core.worktree value
  config: fix leak in git_config_get_expiry_in_days()
  config: drop git_config_get_string_const()
  config: fix leaks from git_config_get_string_const()
  checkout: fix leak of non-existent branch names
  submodule--helper: use strbuf_release() to free strbufs
  clear_pattern_list(): clear embedded hashmaps
2020-08-27 14:04:49 -07:00
Jeff King 9a53219f69 config: drop git_config_get_string_const()
As evidenced by the leak fixes in the previous commit, the "const" in
git_config_get_string_const() clearly misleads people into thinking that
it does not allocate a copy of the string. We can fix this by renaming
it, but it's easier still to just drop it. Of the four remaining
callers:

  - The one in git_config_parse_expiry() still needs to allocate, since
    that's what its callers expect. We can just use the non-const
    version and cast our pointer. Slightly ugly, but the damage is
    contained in one spot.

  - The two in apply are writing to global "const char *" variables, and
    need to continue allocating. We often mark these as const because we
    assign default string literals to them. But in this case we don't do
    that, so we can just declare them as real "char *" pointers and use
    the non-const version.

  - The call in checkout doesn't actually need a copy; it can just use
    the non-allocating "tmp" version of the function.

The function is also mentioned in the MyFirstContribution document. We
can swap that call out for the non-allocating "tmp" variant, which fits
well in the example given.

We'll drop the "configset" and "repo" variants, as well (which are
unused).

Note that this frees up the "const" name, so we could rename the "tmp"
variant back to that. But let's give some time for topics in flight to
adapt to the new code before doing so (if we do it too soon, the
function semantics will change but the compiler won't alert us).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-08-17 15:35:47 -07:00
Junio C Hamano c28a2d0c12 Merge branch 'jk/reject-newer-extensions-in-v0' into master
With the base fix to 2.27 regresion, any new extensions in a v0
repository would still be silently honored, which is not quite
right.  Instead, complain and die loudly.

* jk/reject-newer-extensions-in-v0:
  verify_repository_format(): complain about new extensions in v0 repo
2020-07-30 13:20:32 -07:00
Junio C Hamano d13b7f2198 Merge branch 'jn/v0-with-extensions-fix' into master
In 2.28-rc0, we corrected a bug that some repository extensions are
honored by mistake even in a version 0 repositories (these
configuration variables in extensions.* namespace were supposed to
have special meaning in repositories whose version numbers are 1 or
higher), but this was a bit too big a change.

* jn/v0-with-extensions-fix:
  repository: allow repository format upgrade with extensions
  Revert "check_repository_format_gently(): refuse extensions for old repositories"
2020-07-16 17:58:42 -07:00
Jeff King ec91ffca04 verify_repository_format(): complain about new extensions in v0 repo
We made the mistake in the past of respecting extensions.* even when the
repository format version was set to 0. This is bad because forgetting
to bump the repository version means that older versions of Git (which
do not know about our extensions) won't complain. I.e., it's not a
problem in itself, but it means your repository is in a state which does
not give you the protection you think you're getting from older
versions.

For compatibility reasons, we are stuck with that decision for existing
extensions. However, we'd prefer not to extend the damage further. We
can do that by catching any newly-added extensions and complaining about
the repository format.

Note that this is a pretty heavy hammer: we'll refuse to work with the
repository at all. A lesser option would be to ignore (possibly with a
warning) any new extensions. But because of the way the extensions are
handled, that puts the burden on each new extension that is added to
remember to "undo" itself (because they are handled before we know
for sure whether we are in a v1 repo or not, since we don't insist on a
particular ordering of config entries).

So one option would be to rewrite that handling to record any new
extensions (and their values) during the config parse, and then only
after proceed to handle new ones only if we're in a v1 repository. But
I'm not sure if it's worth the trouble:

  - ignoring extensions is likely to end up with broken results anyway
    (e.g., ignoring a proposed objectformat extension means parsing any
    object data is likely to encounter errors)

  - this is a sign that whatever tool wrote the extension field is
    broken. We may be better off notifying immediately and forcefully so
    that such tools don't even appear to work accidentally.

The only downside is that fixing the situation is a little tricky,
because programs like "git config" won't want to work with the
repository. But:

  git config --file=.git/config core.repositoryformatversion 1

should still suffice.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 10:39:45 -07:00
Jonathan Nieder 62f2eca606 repository: allow repository format upgrade with extensions
Now that we officially permit repository extensions in repository
format v0, permit upgrading a repository with extensions from v0 to v1
as well.

For example, this means a repository where the user has set
"extensions.preciousObjects" can use "git fetch --filter=blob:none
origin" to upgrade the repository to use v1 and the partial clone
extension.

To avoid mistakes, continue to forbid repository format upgrades in v0
repositories with an unrecognized extension.  This way, a v0 user
using a misspelled extension field gets a chance to correct the
mistake before updating to the less forgiving v1 format.

While we're here, make the error message for failure to upgrade the
repository format a bit shorter, and present it as an error, not a
warning.

Reported-by: Huan Huan Chen <huanhuanchen@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 09:36:39 -07:00
Junio C Hamano 11cbda2add Merge branch 'js/default-branch-name'
The name of the primary branch in existing repositories, and the
default name used for the first branch in newly created
repositories, is made configurable, so that we can eventually wean
ourselves off of the hardcoded 'master'.

* js/default-branch-name:
  contrib: subtree: adjust test to change in fmt-merge-msg
  testsvn: respect `init.defaultBranch`
  remote: use the configured default branch name when appropriate
  clone: use configured default branch name when appropriate
  init: allow setting the default for the initial branch name via the config
  init: allow specifying the initial branch name for the new repository
  docs: add missing diamond brackets
  submodule: fall back to remote's HEAD for missing remote.<name>.branch
  send-pack/transport-helper: avoid mentioning a particular branch
  fmt-merge-msg: stop treating `master` specially
2020-07-06 22:09:17 -07:00
Johannes Schindelin 32ba12dab2 init: allow specifying the initial branch name for the new repository
There is a growing number of projects and companies desiring to change
the main branch name of their repositories (see e.g.
https://twitter.com/mislav/status/1270388510684598272 for background on
this).

To change that branch name for new repositories, currently the only way
to do that automatically is by copying all of Git's template directory,
then hard-coding the desired default branch name into the `.git/HEAD`
file, and then configuring `init.templateDir` to point to those copied
template files.

To make this process much less cumbersome, let's introduce a new option:
`--initial-branch=<branch-name>`.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-24 09:14:21 -07:00
Xin Li 16af5f1abb repository: add a helper function to perform repository format upgrade
In version 1 of repository format, "extensions" gained special meaning
and it is safer to avoid upgrading when there are pre-existing
extensions.

Make list-objects-filter to use the helper function instead of setting
repository version directly as a prerequisite of exposing the upgrade
capability.

Signed-off-by: Xin Li <delphij@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-06-05 10:13:30 -07:00
Junio C Hamano a768f866e9 Merge branch 'jk/oid-array-cleanups'
Code cleanup.

* jk/oid-array-cleanups:
  oidset: stop referring to sha1-array
  ref-filter: stop referring to "sha1 array"
  bisect: stop referring to sha1_array
  test-tool: rename sha1-array to oid-array
  oid_array: rename source file from sha1-array
  oid_array: use size_t for iteration
  oid_array: use size_t for count and allocation
2020-04-22 13:42:49 -07:00
Jeff King fe299ec5ae oid_array: rename source file from sha1-array
We renamed the actual data structure in 910650d2f8 (Rename sha1_array to
oid_array, 2017-03-31), but the file is still called sha1-array. Besides
being slightly confusing, it makes it more annoying to grep for leftover
occurrences of "sha1" in various files, because the header is included
in so many places.

Let's complete the transition by renaming the source and header files
(and fixing up a few comment references).

I kept the "-" in the name, as that seems to be our style; cf.
fc1395f4a4 (sha1_file.c: rename to use dash in file name, 2018-04-10).
We also have oidmap.h and oidset.h without any punctuation, but those
are "struct oidmap" and "struct oidset" in the code. We _could_ make
this "oidarray" to match, but somehow it looks uglier to me because of
the length of "array" (plus it would be a very invasive patch for little
gain).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-30 10:59:08 -07:00
Junio C Hamano 4e4baee3f4 Merge branch 'bc/filter-process'
Provide more information (e.g. the object of the tree-ish in which
the blob being converted appears, in addition to its path, which
has already been given) to smudge/clean conversion filters.

* bc/filter-process:
  t0021: test filter metadata for additional cases
  builtin/reset: compute checkout metadata for reset
  builtin/rebase: compute checkout metadata for rebases
  builtin/clone: compute checkout metadata for clones
  builtin/checkout: compute checkout metadata for checkouts
  convert: provide additional metadata to filters
  convert: permit passing additional metadata to filter processes
  builtin/checkout: pass branch info down to checkout_worktree
2020-03-26 17:11:20 -07:00
Junio C Hamano f8cb64e3d4 Merge branch 'bc/sha-256-part-1-of-4'
SHA-256 transition continues.

* bc/sha-256-part-1-of-4: (22 commits)
  fast-import: add options for rewriting submodules
  fast-import: add a generic function to iterate over marks
  fast-import: make find_marks work on any mark set
  fast-import: add helper function for inserting mark object entries
  fast-import: permit reading multiple marks files
  commit: use expected signature header for SHA-256
  worktree: allow repository version 1
  init-db: move writing repo version into a function
  builtin/init-db: add environment variable for new repo hash
  builtin/init-db: allow specifying hash algorithm on command line
  setup: allow check_repository_format to read repository format
  t/helper: make repository tests hash independent
  t/helper: initialize repository if necessary
  t/helper/test-dump-split-index: initialize git repository
  t6300: make hash algorithm independent
  t6300: abstract away SHA-1-specific constants
  t: use hash-specific lookup tables to define test constants
  repository: require a build flag to use SHA-256
  hex: add functions to parse hex object IDs in any algorithm
  hex: introduce parsing variants taking hash algorithms
  ...
2020-03-26 17:11:20 -07:00
brian m. carlson c397aac02f convert: provide additional metadata to filters
Now that we have the codebase wired up to pass any additional metadata
to filters, let's collect the additional metadata that we'd like to
pass.

The two main places we pass this metadata are checkouts and archives.
In these two situations, reading HEAD isn't a valid option, since HEAD
isn't updated for checkouts until after the working tree is written and
archives can accept an arbitrary tree.  In other situations, HEAD will
usually reflect the refname of the branch in current use.

We pass a smaller amount of data in other cases, such as git cat-file,
where we can really only logically know about the blob.

This commit updates only the parts of the checkout code where we don't
use unpack_trees.  That function and callers of it will be handled in a
future commit.

In the archive code, we leak a small amount of memory, since nothing we
pass in the archiver argument structure is freed.

Signed-off-by: brian m. carlson <bk2204@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-16 11:37:02 -07:00
Alexandr Miloslavskiy 4530a85b4c real_path_if_valid(): remove unsafe API
This commit continues the work started with previous commit.

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-10 11:41:40 -07:00
Alexandr Miloslavskiy 3d7747e318 real_path: remove unsafe API
Returning a shared buffer invites very subtle bugs due to reentrancy or
multi-threading, as demonstrated by the previous patch.

There was an unfinished effort to abolish this [1].

Let's finally rid of `real_path()`, using `strbuf_realpath()` instead.

This patch uses a local `strbuf` for most places where `real_path()` was
previously called.

However, two places return the value of `real_path()` to the caller. For
them, a `static` local `strbuf` was added, effectively pushing the
problem one level higher:
    read_gitfile_gently()
    get_superproject_working_tree()

[1] https://lore.kernel.org/git/1480964316-99305-1-git-send-email-bmwill@google.com/

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-10 11:41:40 -07:00
Alexandr Miloslavskiy 0915a5b4cd set_git_dir: fix crash when used with real_path()
`real_path()` returns result from a shared buffer, inviting subtle
reentrance bugs. One of these bugs occur when invoked this way:
    set_git_dir(real_path(git_dir))

In this case, `real_path()` has reentrance:
    real_path
    read_gitfile_gently
    repo_set_gitdir
    setup_git_env
    set_git_dir_1
    set_git_dir

Later, `set_git_dir()` uses its now-dead parameter:
    !is_absolute_path(path)

Fix this by using a dedicated `strbuf` to hold `strbuf_realpath()`.

Signed-off-by: Alexandr Miloslavskiy <alexandr.miloslavskiy@syntevo.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-03-06 14:45:51 -08:00
brian m. carlson efa7ae36c1 init-db: move writing repo version into a function
When we perform a clone, we won't know the remote side's hash algorithm
until we've read the heads.  Consequently, we'll need to rewrite the
repository format version and hash algorithm once we know what the
remote side has.  Move the code that does this into its own function so
that we can call it from clone in the future.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:30 -08:00
brian m. carlson 8b8f7189df builtin/init-db: allow specifying hash algorithm on command line
Allow the user to specify the hash algorithm on the command line by
using the --object-format option to git init.  Validate that the user is
not attempting to reinitialize a repository with a different hash
algorithm.  Ensure that if we are writing a non-SHA-1 repository that we
set the repository version to 1 and write the objectFormat extension.

Restrict this option to work only when ENABLE_SHA256 is set until the
codebase is in a situation to fully support this.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:27 -08:00
brian m. carlson cfe3917c85 setup: allow check_repository_format to read repository format
In some cases, we will want to not only check the repository format, but
extract the information that we've gained.  To do so, allow
check_repository_format to take a pointer to struct repository_format.
Allow passing NULL for this argument if we're not interested in the
information, and pass NULL for all existing callers.  A future patch
will make use of this information.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:27 -08:00
brian m. carlson 61e2a70ff2 hex: add functions to parse hex object IDs in any algorithm
There are some places where we need to parse a hex object ID in any
algorithm without knowing beforehand which algorithm is in use. An
example is when parsing fast-import marks.

Add a get_oid_hex_any to parse an object ID and return the algorithm it
belongs to, and additionally add parse_oid_hex_any which is the
equivalent change for parse_oid_hex. If the object is not parseable, we
return GIT_HASH_UNKNOWN.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:21 -08:00
brian m. carlson dadacf10dc hex: introduce parsing variants taking hash algorithms
Introduce variants of get_oid_hex and parse_oid_hex that parse an
arbitrary hash algorithm, implementing internal functions to avoid
duplication.  These functions can be used in the transport code to parse
refs properly.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-02-24 09:33:21 -08:00
Junio C Hamano 78e67cda42 Merge branch 'mt/use-passed-repo-more-in-funcs'
Some codepaths were given a repository instance as a parameter to
work in the repository, but passed the_repository instance to its
callees, which has been cleaned up (somewhat).

* mt/use-passed-repo-more-in-funcs:
  sha1-file: allow check_object_signature() to handle any repo
  sha1-file: pass git_hash_algo to hash_object_file()
  sha1-file: pass git_hash_algo to write_object_file_prepare()
  streaming: allow open_istream() to handle any repo
  pack-check: use given repo's hash_algo at verify_packfile()
  cache-tree: use given repo's hash_algo at verify_one()
  diff: make diff_populate_filespec() honor its repo argument
2020-02-14 12:54:22 -08:00
Matheus Tavares b98d188581 sha1-file: allow check_object_signature() to handle any repo
Some callers of check_object_signature() can work on arbitrary
repositories, but the repo does not get passed to this function.
Instead, the_repository is always used internally. To fix possible
inconsistencies, allow the function to receive a struct repository and
make those callers pass on the repo being handled.

Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-31 10:45:39 -08:00
Kevin Willford 56c6910028 fsmonitor: change last update timestamp on the index_state to opaque token
Some file system monitors might not use or take a timestamp for processing
and in the case of watchman could have race conditions with using a
timestamp. Watchman uses something called a clockid that is used for race
free queries to it. The clockid for watchman is simply a string.

Change the fsmonitor_last_update from being a uint64_t to a char pointer
so that any arbitrary data can be stored in it and passed back to the
fsmonitor.

Signed-off-by: Kevin Willford <Kevin.Willford@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-13 14:58:43 -08:00
Junio C Hamano 0f1930cd1b Merge branch 'ds/sparse-cone'
Code cleanup.

* ds/sparse-cone:
  Documentation/git-sparse-checkout.txt: fix a typo
  sparse-checkout: use extern for global variables
2020-01-06 14:17:51 -08:00
Derrick Stolee 44143583b7 sparse-checkout: use extern for global variables
When the core.sparseCheckoutCone config setting was added in
879321eb0b ("sparse-checkout: add 'cone' mode" 2019-11-21), the
variables storing the config values for core.sparseCheckout and
core.sparseCheckoutCone were rearranged in cache.h, but in doing
so the "extern" keyword was dropped.

While we are tending to drop the "extern" keyword for function
declarations, it is still necessary for global variables used
across multiple *.c files. The impact of not having the extern
keyword may be unpredictable.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-01-02 10:18:42 -08:00
Junio C Hamano bd72a08d6c Merge branch 'ds/sparse-cone'
Management of sparsely checked-out working tree has gained a
dedicated "sparse-checkout" command.

* ds/sparse-cone: (21 commits)
  sparse-checkout: improve OS ls compatibility
  sparse-checkout: respect core.ignoreCase in cone mode
  sparse-checkout: check for dirty status
  sparse-checkout: update working directory in-process for 'init'
  sparse-checkout: cone mode should not interact with .gitignore
  sparse-checkout: write using lockfile
  sparse-checkout: use in-process update for disable subcommand
  sparse-checkout: update working directory in-process
  sparse-checkout: sanitize for nested folders
  unpack-trees: add progress to clear_ce_flags()
  unpack-trees: hash less in cone mode
  sparse-checkout: init and set in cone mode
  sparse-checkout: use hashmaps for cone patterns
  sparse-checkout: add 'cone' mode
  trace2: add region in clear_ce_flags
  sparse-checkout: create 'disable' subcommand
  sparse-checkout: add '--stdin' option to set subcommand
  sparse-checkout: 'set' subcommand
  clone: add --sparse mode
  sparse-checkout: create 'init' subcommand
  ...
2019-12-25 11:21:58 -08:00
Junio C Hamano 26c816a67d Merge branch 'hw/doc-in-header'
* hw/doc-in-header:
  trace2: move doc to trace2.h
  submodule-config: move doc to submodule-config.h
  tree-walk: move doc to tree-walk.h
  trace: move doc to trace.h
  run-command: move doc to run-command.h
  parse-options: add link to doc file in parse-options.h
  credential: move doc to credential.h
  argv-array: move doc to argv-array.h
  cache: move doc to cache.h
  sigchain: move doc to sigchain.h
  pathspec: move doc to pathspec.h
  revision: move doc to revision.h
  attr: move doc to attr.h
  refs: move doc to refs.h
  remote: move doc to remote.h and refspec.h
  sha1-array: move doc to sha1-array.h
  merge: move doc to ll-merge.h
  graph: move doc to graph.h and graph.c
  dir: move doc to dir.h
  diff: move doc to diff.h and diffcore.h
2019-12-16 13:08:39 -08:00
Derrick Stolee 4dcd4def3c unpack-trees: add progress to clear_ce_flags()
When a large repository has many sparse-checkout patterns, the
process for updating the skip-worktree bits can take long enough
that a user gets confused why nothing is happening. Update the
clear_ce_flags() method to write progress.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-22 16:11:44 +09:00