1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-04 22:56:34 +02:00
git/setup.c

2297 lines
63 KiB
C
Raw Normal View History

#include "git-compat-util.h"
#include "abspath.h"
#include "copy.h"
#include "environment.h"
#include "exec-cmd.h"
#include "gettext.h"
#include "object-name.h"
#include "refs.h"
#include "repository.h"
#include "config.h"
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
#include "dir.h"
#include "setup.h"
#include "string-list.h"
set_work_tree: use chdir_notify When we change to the top of the working tree, we manually re-adjust $GIT_DIR and call set_git_dir() again, in order to update any relative git-dir we'd compute earlier. Instead of the work-tree code having to know to call the git-dir code, let's use the new chdir_notify interface. There are two spots that need updating, with a few subtleties in each: 1. the set_git_dir() code needs to chdir_notify_register() so it can be told when to update its path. Technically we could push this down into repo_set_gitdir(), so that even repository structs besides the_repository could benefit from this. But that opens up a lot of complications: - we'd still need to touch set_git_dir(), because it does some other setup (like setting $GIT_DIR in the environment) - submodules using other repository structs get cleaned up, which means we'd need to remove them from the chdir_notify list - it's unlikely to fix any bugs, since we shouldn't generally chdir() in the middle of working on a submodule 2. setup_work_tree now needs to call chdir_notify(), and can lose its manual set_git_dir() call. Note that at first glance it looks like this undoes the absolute-to-relative optimization added by 044bbbcb63 (Make git_dir a path relative to work_tree in setup_work_tree(), 2008-06-19). But for the most part that optimization was just _undoing_ the relative-to-absolute conversion which the function was doing earlier (and which is now gone). It is true that if you already have an absolute git_dir that the setup_work_tree() function will no longer make it relative as a side effect. But: - we generally do have relative git-dir's due to the way the discovery code works - if we really care about making git-dir's relative when possible, then we should be relativizing them earlier (e.g., when we see an absolute $GIT_DIR we could turn it relative, whether we are going to chdir into a worktree or not). That would cover all cases, including ones that 044bbbcb63 did not. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-30 20:35:08 +02:00
#include "chdir-notify.h"
#include "path.h"
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
#include "quote.h"
#include "trace2.h"
#include "worktree.h"
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
static int inside_git_dir = -1;
static int inside_work_tree = -1;
setup_git_directory: delay core.bare/core.worktree errors If both core.bare and core.worktree are set, we complain about the bogus config and die. Dying is good, because it avoids commands running and doing damage in a potentially incorrect setup. But dying _there_ is bad, because it means that commands which do not even care about the work tree cannot run. This can make repairing the situation harder: [setup] $ git config core.bare true $ git config core.worktree /some/path [OK, expected.] $ git status fatal: core.bare and core.worktree do not make sense [Hrm...] $ git config --unset core.worktree fatal: core.bare and core.worktree do not make sense [Nope...] $ git config --edit fatal: core.bare and core.worktree do not make sense [Gaaah.] $ git help config fatal: core.bare and core.worktree do not make sense Instead, let's issue a warning about the bogus config when we notice it (i.e., for all commands), but only die when the command tries to use the work tree (by calling setup_work_tree). So we now get: $ git status warning: core.bare and core.worktree do not make sense fatal: unable to set up work tree using invalid config $ git config --unset core.worktree warning: core.bare and core.worktree do not make sense We have to update t1510 to accomodate this; it uses symbolic-ref to check whether the configuration works or not, but of course that command does not use the working tree. Instead, we switch it to use `git status`, as it requires a work-tree, does not need any special setup, and is read-only (so a failure will not adversely affect further tests). In addition, we add a new test that checks the desired behavior (i.e., that running "git config" with the bogus config does in fact work). Reported-by: SZEDER Gábor <szeder@ira.uka.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-29 08:49:10 +02:00
static int work_tree_config_is_bogus;
setup.c: create `safe.bareRepository` There is a known social engineering attack that takes advantage of the fact that a working tree can include an entire bare repository, including a config file. A user could run a Git command inside the bare repository thinking that the config file of the 'outer' repository would be used, but in reality, the bare repository's config file (which is attacker-controlled) is used, which may result in arbitrary code execution. See [1] for a fuller description and deeper discussion. A simple mitigation is to forbid bare repositories unless specified via `--git-dir` or `GIT_DIR`. In environments that don't use bare repositories, this would be minimally disruptive. Create a config variable, `safe.bareRepository`, that tells Git whether or not to die() when working with a bare repository. This config is an enum of: - "all": allow all bare repositories (this is the default) - "explicit": only allow bare repositories specified via --git-dir or GIT_DIR. If we want to protect users from such attacks by default, neither value will suffice - "all" provides no protection, but "explicit" is impractical for bare repository users. A more usable default would be to allow only non-embedded bare repositories ([2] contains one such proposal), but detecting if a repository is embedded is potentially non-trivial, so this work is not implemented in this series. [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com [2]: https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@github.com Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14 23:28:01 +02:00
enum allowed_bare_repo {
ALLOWED_BARE_REPO_EXPLICIT = 0,
ALLOWED_BARE_REPO_ALL,
};
setup: make startup_info available everywhere Commit a60645f (setup: remember whether repository was found, 2010-08-05) introduced the startup_info structure, which records some parts of the setup_git_directory() process (notably, whether we actually found a repository or not). One of the uses of this data is for functions to behave appropriately based on whether we are in a repo. But the startup_info struct is just a pointer to storage provided by the main program, and the only program that sets it up is the git.c wrapper. Thus builtins have access to startup_info, but externally linked programs do not. Worse, library code which is accessible from both has to be careful about accessing startup_info. This can be used to trigger a die("BUG") via get_sha1(): $ git fast-import <<-\EOF tag foo from HEAD:./whatever EOF fatal: BUG: startup_info struct is not initialized. Obviously that's fairly nonsensical input to feed to fast-import, but we should never hit a die("BUG"). And there may be other ways to trigger it if other non-builtins resolve sha1s. So let's point the storage for startup_info to a static variable in setup.c, making it available to all users of the library code. We _could_ turn startup_info into a regular extern struct, but doing so would mean tweaking all of the existing use sites. So let's leave the pointer indirection in place. We can, however, drop any checks for NULL, as they will always be false (and likewise, we can drop the test covering this case, which was a rather artificial situation using one of the test-* programs). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-05 23:10:27 +01:00
static struct startup_info the_startup_info;
struct startup_info *startup_info = &the_startup_info;
const char *tmp_original_cwd;
setup: make startup_info available everywhere Commit a60645f (setup: remember whether repository was found, 2010-08-05) introduced the startup_info structure, which records some parts of the setup_git_directory() process (notably, whether we actually found a repository or not). One of the uses of this data is for functions to behave appropriately based on whether we are in a repo. But the startup_info struct is just a pointer to storage provided by the main program, and the only program that sets it up is the git.c wrapper. Thus builtins have access to startup_info, but externally linked programs do not. Worse, library code which is accessible from both has to be careful about accessing startup_info. This can be used to trigger a die("BUG") via get_sha1(): $ git fast-import <<-\EOF tag foo from HEAD:./whatever EOF fatal: BUG: startup_info struct is not initialized. Obviously that's fairly nonsensical input to feed to fast-import, but we should never hit a die("BUG"). And there may be other ways to trigger it if other non-builtins resolve sha1s. So let's point the storage for startup_info to a static variable in setup.c, making it available to all users of the library code. We _could_ turn startup_info into a regular extern struct, but doing so would mean tweaking all of the existing use sites. So let's leave the pointer indirection in place. We can, however, drop any checks for NULL, as they will always be false (and likewise, we can drop the test covering this case, which was a rather artificial situation using one of the test-* programs). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-05 23:10:27 +01:00
/*
* The input parameter must contain an absolute path, and it must already be
* normalized.
*
* Find the part of an absolute path that lies inside the work tree by
* dereferencing symlinks outside the work tree, for example:
* /dir1/repo/dir2/file (work tree is /dir1/repo) -> dir2/file
* /dir/file (work tree is /) -> dir/file
* /dir/symlink1/symlink2 (symlink1 points to work tree) -> symlink2
* /dir/repolink/file (repolink points to /dir/repo) -> file
* /dir/repo (exactly equal to work tree) -> (empty string)
*/
static int abspath_part_inside_repo(char *path)
{
size_t len;
size_t wtlen;
char *path0;
int off;
const char *work_tree = get_git_work_tree();
struct strbuf realpath = STRBUF_INIT;
if (!work_tree)
return -1;
wtlen = strlen(work_tree);
len = strlen(path);
off = offset_1st_component(path);
/* check if work tree is already the prefix */
if (wtlen <= len && !fspathncmp(path, work_tree, wtlen)) {
if (path[wtlen] == '/') {
memmove(path, path + wtlen + 1, len - wtlen);
return 0;
} else if (path[wtlen - 1] == '/' || path[wtlen] == '\0') {
/* work tree is the root, or the whole path */
memmove(path, path + wtlen, len - wtlen + 1);
return 0;
}
/* work tree might match beginning of a symlink to work tree */
off = wtlen;
}
path0 = path;
path += off;
/* check each '/'-terminated level */
while (*path) {
path++;
if (*path == '/') {
*path = '\0';
strbuf_realpath(&realpath, path0, 1);
if (fspathcmp(realpath.buf, work_tree) == 0) {
memmove(path0, path + 1, len - (path - path0));
strbuf_release(&realpath);
return 0;
}
*path = '/';
}
}
/* check whole path */
strbuf_realpath(&realpath, path0, 1);
if (fspathcmp(realpath.buf, work_tree) == 0) {
*path0 = '\0';
strbuf_release(&realpath);
return 0;
}
strbuf_release(&realpath);
return -1;
}
/*
* Normalize "path", prepending the "prefix" for relative paths. If
* remaining_prefix is not NULL, return the actual prefix still
* remains in the path. For example, prefix = sub1/sub2/ and path is
*
* foo -> sub1/sub2/foo (full prefix)
* ../foo -> sub1/foo (remaining prefix is sub1/)
* ../../bar -> bar (no remaining prefix)
* ../../sub1/sub2/foo -> sub1/sub2/foo (but no remaining prefix)
* `pwd`/../bar -> sub1/bar (no remaining prefix)
*/
char *prefix_path_gently(const char *prefix, int len,
int *remaining_prefix, const char *path)
setup: sanitize absolute and funny paths in get_pathspec() The prefix_path() function called from get_pathspec() is responsible for translating list of user-supplied pathspecs to list of pathspecs that is relative to the root of the work tree. When working inside a subdirectory, the user-supplied pathspecs are taken to be relative to the current subdirectory. Among special path components in pathspecs, we used to accept and interpret only "." ("the directory", meaning a no-op) and ".." ("up one level") at the beginning. Everything else was passed through as-is. For example, if you are in Documentation/ directory of the project, you can name Documentation/howto/maintain-git.txt as: howto/maintain-git.txt ../Documentation/howto/maitain-git.txt ../././Documentation/howto/maitain-git.txt but not as: howto/./maintain-git.txt $(pwd)/howto/maintain-git.txt This patch updates prefix_path() in several ways: - If the pathspec is not absolute, prefix (i.e. the current subdirectory relative to the root of the work tree, with terminating slash, if not empty) and the pathspec is concatenated first and used in the next step. Otherwise, that absolute pathspec is used in the next step. - Then special path components "." (no-op) and ".." (up one level) are interpreted to simplify the path. It is an error to have too many ".." to cause the intermediate result to step outside of the input to this step. - If the original pathspec was not absolute, the result from the previous step is the resulting "sanitized" pathspec. Otherwise, the result from the previous step is still absolute, and it is an error if it does not begin with the directory that corresponds to the root of the work tree. The directory is stripped away from the result and is returned. - In any case, the resulting pathspec in the array get_pathspec() returns omit the ones that caused errors. With this patch, the last two examples also behave as expected. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-29 07:44:27 +01:00
{
const char *orig = path;
char *sanitized;
if (is_absolute_path(orig)) {
sanitized = xmallocz(strlen(path));
if (remaining_prefix)
*remaining_prefix = 0;
if (normalize_path_copy_len(sanitized, path, remaining_prefix)) {
free(sanitized);
return NULL;
}
if (abspath_part_inside_repo(sanitized)) {
free(sanitized);
return NULL;
}
} else {
sanitized = xstrfmt("%.*s%s", len, len ? prefix : "", path);
if (remaining_prefix)
*remaining_prefix = len;
if (normalize_path_copy_len(sanitized, sanitized, remaining_prefix)) {
free(sanitized);
return NULL;
setup: sanitize absolute and funny paths in get_pathspec() The prefix_path() function called from get_pathspec() is responsible for translating list of user-supplied pathspecs to list of pathspecs that is relative to the root of the work tree. When working inside a subdirectory, the user-supplied pathspecs are taken to be relative to the current subdirectory. Among special path components in pathspecs, we used to accept and interpret only "." ("the directory", meaning a no-op) and ".." ("up one level") at the beginning. Everything else was passed through as-is. For example, if you are in Documentation/ directory of the project, you can name Documentation/howto/maintain-git.txt as: howto/maintain-git.txt ../Documentation/howto/maitain-git.txt ../././Documentation/howto/maitain-git.txt but not as: howto/./maintain-git.txt $(pwd)/howto/maintain-git.txt This patch updates prefix_path() in several ways: - If the pathspec is not absolute, prefix (i.e. the current subdirectory relative to the root of the work tree, with terminating slash, if not empty) and the pathspec is concatenated first and used in the next step. Otherwise, that absolute pathspec is used in the next step. - Then special path components "." (no-op) and ".." (up one level) are interpreted to simplify the path. It is an error to have too many ".." to cause the intermediate result to step outside of the input to this step. - If the original pathspec was not absolute, the result from the previous step is the resulting "sanitized" pathspec. Otherwise, the result from the previous step is still absolute, and it is an error if it does not begin with the directory that corresponds to the root of the work tree. The directory is stripped away from the result and is returned. - In any case, the resulting pathspec in the array get_pathspec() returns omit the ones that caused errors. With this patch, the last two examples also behave as expected. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-01-29 07:44:27 +01:00
}
}
return sanitized;
}
char *prefix_path(const char *prefix, int len, const char *path)
{
char *r = prefix_path_gently(prefix, len, NULL, path);
if (!r) {
const char *hint_path = get_git_work_tree();
if (!hint_path)
hint_path = get_git_dir();
die(_("'%s' is outside repository at '%s'"), path,
absolute_path(hint_path));
}
return r;
}
int path_inside_repo(const char *prefix, const char *path)
{
int len = prefix ? strlen(prefix) : 0;
char *r = prefix_path_gently(prefix, len, NULL, path);
if (r) {
free(r);
return 1;
}
return 0;
}
int check_filename(const char *prefix, const char *arg)
{
char *to_free = NULL;
struct stat st;
if (skip_prefix(arg, ":/", &arg)) {
if (!*arg) /* ":/" is root dir, always exists */
return 1;
prefix = NULL;
} else if (skip_prefix(arg, ":!", &arg) ||
skip_prefix(arg, ":^", &arg)) {
if (!*arg) /* excluding everything is silly, but allowed */
return 1;
}
if (prefix)
arg = to_free = prefix_filename(prefix, arg);
if (!lstat(arg, &st)) {
free(to_free);
return 1; /* file exists */
}
if (is_missing_file_error(errno)) {
free(to_free);
return 0; /* file does not exist */
}
die_errno(_("failed to stat '%s'"), arg);
}
static void NORETURN die_verify_filename(struct repository *r,
const char *prefix,
const char *arg,
int diagnose_misspelt_rev)
{
if (!diagnose_misspelt_rev)
die(_("%s: no such path in the working tree.\n"
"Use 'git <command> -- <path>...' to specify paths that do not exist locally."),
arg);
/*
* Saying "'(icase)foo' does not exist in the index" when the
* user gave us ":(icase)foo" is just stupid. A magic pathspec
* begins with a colon and is followed by a non-alnum; do not
* let maybe_die_on_misspelt_object_name() even trigger.
*/
if (!(arg[0] == ':' && !isalnum(arg[1])))
maybe_die_on_misspelt_object_name(r, arg, prefix);
/* ... or fall back the most general message. */
die(_("ambiguous argument '%s': unknown revision or path not in the working tree.\n"
"Use '--' to separate paths from revisions, like this:\n"
"'git <command> [<revision>...] -- [<file>...]'"), arg);
}
verify_filename(): treat ":(magic)" as a pathspec For commands that take revisions and pathspecs, magic pathspecs like ":(exclude)foo" require the user to specify a disambiguating "--", since they do not match a file in the filesystem, like: git grep foo -- :(exclude)bar This makes them more annoying to use than they need to be. We loosened the rules for wildcards in 28fcc0b71 (pathspec: avoid the need of "--" when wildcard is used, 2015-05-02). Let's do the same for pathspecs with long-form magic. We already handle the short-forms ":/" and ":^" specially in check_filename(), so we don't need to handle them here. And in fact, we could do the same with long-form magic, parsing out the actual filename and making sure it exists. But there are a few reasons not to do it that way: - the parsing gets much more complicated, and we'd want to hand it off to the pathspec code. But that code isn't ready to do this kind of speculative parsing (it's happy to die() when it sees a syntactically invalid pathspec). - not all pathspec magic maps to a filesystem path. E.g., :(attr) should be treated as a pathspec regardless of what is in the filesystem - we can be a bit looser with ":(" than with the short-form ":/", because it is much less likely to have a false positive. Whereas ":/" also means "search for a commit with this regex". Note that because the change is in verify_filename() and not in its helper check_filename(), this doesn't affect the verify_non_filename() case. I.e., if an item that matches our new rule doesn't resolve as an object, we may fallback to treating it as a pathspec (rather than complaining it doesn't exist). But if it does resolve (e.g., as a file in the index that starts with an open-paren), we won't then complain that it's also a valid pathspec. This matches the wildcard-exception behavior. And of course in either case, one can always insert the "--" to get more precise results. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 21:10:31 +02:00
/*
* Check for arguments that don't resolve as actual files,
* but which look sufficiently like pathspecs that we'll consider
* them such for the purposes of rev/pathspec DWIM parsing.
*/
static int looks_like_pathspec(const char *arg)
{
const char *p;
int escaped = 0;
/*
* Wildcard characters imply the user is looking to match pathspecs
* that aren't in the filesystem. Note that this doesn't include
* backslash even though it's a glob special; by itself it doesn't
* cause any increase in the match. Likewise ignore backslash-escaped
* wildcard characters.
*/
for (p = arg; *p; p++) {
if (escaped) {
escaped = 0;
} else if (is_glob_special(*p)) {
if (*p == '\\')
escaped = 1;
else
return 1;
}
}
verify_filename(): treat ":(magic)" as a pathspec For commands that take revisions and pathspecs, magic pathspecs like ":(exclude)foo" require the user to specify a disambiguating "--", since they do not match a file in the filesystem, like: git grep foo -- :(exclude)bar This makes them more annoying to use than they need to be. We loosened the rules for wildcards in 28fcc0b71 (pathspec: avoid the need of "--" when wildcard is used, 2015-05-02). Let's do the same for pathspecs with long-form magic. We already handle the short-forms ":/" and ":^" specially in check_filename(), so we don't need to handle them here. And in fact, we could do the same with long-form magic, parsing out the actual filename and making sure it exists. But there are a few reasons not to do it that way: - the parsing gets much more complicated, and we'd want to hand it off to the pathspec code. But that code isn't ready to do this kind of speculative parsing (it's happy to die() when it sees a syntactically invalid pathspec). - not all pathspec magic maps to a filesystem path. E.g., :(attr) should be treated as a pathspec regardless of what is in the filesystem - we can be a bit looser with ":(" than with the short-form ":/", because it is much less likely to have a false positive. Whereas ":/" also means "search for a commit with this regex". Note that because the change is in verify_filename() and not in its helper check_filename(), this doesn't affect the verify_non_filename() case. I.e., if an item that matches our new rule doesn't resolve as an object, we may fallback to treating it as a pathspec (rather than complaining it doesn't exist). But if it does resolve (e.g., as a file in the index that starts with an open-paren), we won't then complain that it's also a valid pathspec. This matches the wildcard-exception behavior. And of course in either case, one can always insert the "--" to get more precise results. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 21:10:31 +02:00
/* long-form pathspec magic */
if (starts_with(arg, ":("))
return 1;
return 0;
}
/*
* Verify a filename that we got as an argument for a pathspec
* entry. Note that a filename that begins with "-" never verifies
* as true, because even if such a filename were to exist, we want
* it to be preceded by the "--" marker (or we want the user to
* use a format like "./-filename")
*
* The "diagnose_misspelt_rev" is used to provide a user-friendly
* diagnosis when dying upon finding that "name" is not a pathname.
* If set to 1, the diagnosis will try to diagnose "name" as an
* invalid object name (e.g. HEAD:foo). If set to 0, the diagnosis
* will only complain about an inexisting file.
*
* This function is typically called to check that a "file or rev"
* argument is unambiguous. In this case, the caller will want
* diagnose_misspelt_rev == 1 when verifying the first non-rev
* argument (which could have been a revision), and
* diagnose_misspelt_rev == 0 for the next ones (because we already
* saw a filename, there's not ambiguity anymore).
*/
void verify_filename(const char *prefix,
const char *arg,
int diagnose_misspelt_rev)
{
if (*arg == '-')
die(_("option '%s' must come before non-option arguments"), arg);
if (looks_like_pathspec(arg) || check_filename(prefix, arg))
return;
die_verify_filename(the_repository, prefix, arg, diagnose_misspelt_rev);
}
/*
* Opposite of the above: the command line did not have -- marker
* and we parsed the arg as a refname. It should not be interpretable
* as a filename.
*/
void verify_non_filename(const char *prefix, const char *arg)
{
if (!is_inside_work_tree() || is_inside_git_dir())
return;
if (*arg == '-')
return; /* flag */
if (!check_filename(prefix, arg))
return;
die(_("ambiguous argument '%s': both revision and filename\n"
"Use '--' to separate paths from revisions, like this:\n"
"'git <command> [<revision>...] -- [<file>...]'"), arg);
}
int get_common_dir(struct strbuf *sb, const char *gitdir)
{
const char *git_env_common_dir = getenv(GIT_COMMON_DIR_ENVIRONMENT);
if (git_env_common_dir) {
strbuf_addstr(sb, git_env_common_dir);
return 1;
} else {
return get_common_dir_noenv(sb, gitdir);
}
}
int get_common_dir_noenv(struct strbuf *sb, const char *gitdir)
{
struct strbuf data = STRBUF_INIT;
struct strbuf path = STRBUF_INIT;
int ret = 0;
strbuf_addf(&path, "%s/commondir", gitdir);
if (file_exists(path.buf)) {
if (strbuf_read_file(&data, path.buf, 0) <= 0)
die_errno(_("failed to read %s"), path.buf);
while (data.len && (data.buf[data.len - 1] == '\n' ||
data.buf[data.len - 1] == '\r'))
data.len--;
data.buf[data.len] = '\0';
strbuf_reset(&path);
if (!is_absolute_path(data.buf))
strbuf_addf(&path, "%s/", gitdir);
strbuf_addbuf(&path, &data);
strbuf_add_real_path(sb, path.buf);
ret = 1;
} else {
strbuf_addstr(sb, gitdir);
}
strbuf_release(&data);
strbuf_release(&path);
return ret;
}
/*
* Test if it looks like we're at a git directory.
* We want to see:
*
* - either an objects/ directory _or_ the proper
* GIT_OBJECT_DIRECTORY environment variable
* - a refs/ directory
* - either a HEAD symlink or a HEAD file that is formatted as
* a proper "ref:", or a regular file HEAD that has a properly
* formatted sha1 object name.
*/
standardize and improve lookup rules for external local repos When you specify a local repository on the command line of clone, ls-remote, upload-pack, receive-pack, or upload-archive, or in a request to git-daemon, we perform a little bit of lookup magic, doing things like looking in working trees for .git directories and appending ".git" for bare repos. For clone, this magic happens in get_repo_path. For everything else, it happens in enter_repo. In both cases, there are some ambiguous or confusing cases that aren't handled well, and there is one case that is not handled the same by both methods. This patch tries to provide (and test!) standard, sensible lookup rules for both code paths. The intended changes are: 1. When looking up "foo", we have always preferred a working tree "foo" (containing "foo/.git" over the bare "foo.git". But we did not prefer a bare "foo" over "foo.git". With this patch, we do so. 2. We would select directories that existed but didn't actually look like git repositories. With this patch, we make sure a selected directory looks like a git repo. Not only is this more sensible in general, but it will help anybody who is negatively affected by change (1) negatively (e.g., if they had "foo.git" next to its separate work tree "foo", and expect to keep finding "foo.git" when they reference "foo"). 3. The enter_repo code path would, given "foo", look for "foo.git/.git" (i.e., do the ".git" append magic even for a repo with working tree). The clone code path did not; with this patch, they now behave the same. In the unlikely case of a working tree overlaying a bare repo (i.e., a ".git" directory _inside_ a bare repo), we continue to treat it as a working tree (prefering the "inner" .git over the bare repo). This is mainly because the combination seems nonsensical, and I'd rather stick with existing behavior on the off chance that somebody is relying on it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-02-02 22:59:13 +01:00
int is_git_directory(const char *suspect)
{
struct strbuf path = STRBUF_INIT;
int ret = 0;
size_t len;
/* Check worktree-related signatures */
strbuf_addstr(&path, suspect);
strbuf_complete(&path, '/');
strbuf_addstr(&path, "HEAD");
if (validate_headref(path.buf))
goto done;
strbuf_reset(&path);
get_common_dir(&path, suspect);
len = path.len;
/* Check non-worktree-related signatures */
if (getenv(DB_ENVIRONMENT)) {
if (access(getenv(DB_ENVIRONMENT), X_OK))
goto done;
}
else {
strbuf_setlen(&path, len);
strbuf_addstr(&path, "/objects");
if (access(path.buf, X_OK))
goto done;
}
strbuf_setlen(&path, len);
strbuf_addstr(&path, "/refs");
if (access(path.buf, X_OK))
goto done;
ret = 1;
done:
strbuf_release(&path);
return ret;
}
int is_nonbare_repository_dir(struct strbuf *path)
{
int ret = 0;
int gitfile_error;
size_t orig_path_len = path->len;
assert(orig_path_len != 0);
strbuf_complete(path, '/');
strbuf_addstr(path, ".git");
if (read_gitfile_gently(path->buf, &gitfile_error) || is_git_directory(path->buf))
ret = 1;
if (gitfile_error == READ_GITFILE_ERR_OPEN_FAILED ||
gitfile_error == READ_GITFILE_ERR_READ_FAILED)
ret = 1;
strbuf_setlen(path, orig_path_len);
return ret;
}
int is_inside_git_dir(void)
{
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
if (inside_git_dir < 0)
inside_git_dir = is_inside_dir(get_git_dir());
return inside_git_dir;
introduce GIT_WORK_TREE to specify the work tree setup_gdg is used as abbreviation for setup_git_directory_gently. The work tree can be specified using the environment variable GIT_WORK_TREE and the config option core.worktree (the environment variable has precendence over the config option). Additionally there is a command line option --work-tree which sets the environment variable. setup_gdg does the following now: GIT_DIR unspecified repository in .git directory parent directory of the .git directory is used as work tree, GIT_WORK_TREE is ignored GIT_DIR unspecified repository in cwd GIT_DIR is set to cwd see the cases with GIT_DIR specified what happens next and also see the note below GIT_DIR specified GIT_WORK_TREE/core.worktree unspecified cwd is used as work tree GIT_DIR specified GIT_WORK_TREE/core.worktree specified the specified work tree is used Note on the case where GIT_DIR is unspecified and repository is in cwd: GIT_WORK_TREE is used but is_inside_git_dir is always true. I did it this way because setup_gdg might be called multiple times (e.g. when doing alias expansion) and in successive calls setup_gdg should do the same thing every time. Meaning of is_bare/is_inside_work_tree/is_inside_git_dir: (1) is_bare_repository A repository is bare if core.bare is true or core.bare is unspecified and the name suggests it is bare (directory not named .git). The bare option disables a few protective checks which are useful with a working tree. Currently this changes if a repository is bare: updates of HEAD are allowed git gc packs the refs the reflog is disabled by default (2) is_inside_work_tree True if the cwd is inside the associated working tree (if there is one), false otherwise. (3) is_inside_git_dir True if the cwd is inside the git directory, false otherwise. Before this patch is_inside_git_dir was always true for bare repositories. When setup_gdg finds a repository git_config(git_default_config) is always called. This ensure that is_bare_repository makes use of core.bare and does not guess even though core.bare is specified. inside_work_tree and inside_git_dir are set if setup_gdg finds a repository. The is_inside_work_tree and is_inside_git_dir functions will die if they are called before a successful call to setup_gdg. Signed-off-by: Matthias Lederhofer <matled@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-06 09:10:42 +02:00
}
int is_inside_work_tree(void)
{
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
if (inside_work_tree < 0)
inside_work_tree = is_inside_dir(get_git_work_tree());
return inside_work_tree;
introduce GIT_WORK_TREE to specify the work tree setup_gdg is used as abbreviation for setup_git_directory_gently. The work tree can be specified using the environment variable GIT_WORK_TREE and the config option core.worktree (the environment variable has precendence over the config option). Additionally there is a command line option --work-tree which sets the environment variable. setup_gdg does the following now: GIT_DIR unspecified repository in .git directory parent directory of the .git directory is used as work tree, GIT_WORK_TREE is ignored GIT_DIR unspecified repository in cwd GIT_DIR is set to cwd see the cases with GIT_DIR specified what happens next and also see the note below GIT_DIR specified GIT_WORK_TREE/core.worktree unspecified cwd is used as work tree GIT_DIR specified GIT_WORK_TREE/core.worktree specified the specified work tree is used Note on the case where GIT_DIR is unspecified and repository is in cwd: GIT_WORK_TREE is used but is_inside_git_dir is always true. I did it this way because setup_gdg might be called multiple times (e.g. when doing alias expansion) and in successive calls setup_gdg should do the same thing every time. Meaning of is_bare/is_inside_work_tree/is_inside_git_dir: (1) is_bare_repository A repository is bare if core.bare is true or core.bare is unspecified and the name suggests it is bare (directory not named .git). The bare option disables a few protective checks which are useful with a working tree. Currently this changes if a repository is bare: updates of HEAD are allowed git gc packs the refs the reflog is disabled by default (2) is_inside_work_tree True if the cwd is inside the associated working tree (if there is one), false otherwise. (3) is_inside_git_dir True if the cwd is inside the git directory, false otherwise. Before this patch is_inside_git_dir was always true for bare repositories. When setup_gdg finds a repository git_config(git_default_config) is always called. This ensure that is_bare_repository makes use of core.bare and does not guess even though core.bare is specified. inside_work_tree and inside_git_dir are set if setup_gdg finds a repository. The is_inside_work_tree and is_inside_git_dir functions will die if they are called before a successful call to setup_gdg. Signed-off-by: Matthias Lederhofer <matled@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-06 09:10:42 +02:00
}
void setup_work_tree(void)
{
set_work_tree: use chdir_notify When we change to the top of the working tree, we manually re-adjust $GIT_DIR and call set_git_dir() again, in order to update any relative git-dir we'd compute earlier. Instead of the work-tree code having to know to call the git-dir code, let's use the new chdir_notify interface. There are two spots that need updating, with a few subtleties in each: 1. the set_git_dir() code needs to chdir_notify_register() so it can be told when to update its path. Technically we could push this down into repo_set_gitdir(), so that even repository structs besides the_repository could benefit from this. But that opens up a lot of complications: - we'd still need to touch set_git_dir(), because it does some other setup (like setting $GIT_DIR in the environment) - submodules using other repository structs get cleaned up, which means we'd need to remove them from the chdir_notify list - it's unlikely to fix any bugs, since we shouldn't generally chdir() in the middle of working on a submodule 2. setup_work_tree now needs to call chdir_notify(), and can lose its manual set_git_dir() call. Note that at first glance it looks like this undoes the absolute-to-relative optimization added by 044bbbcb63 (Make git_dir a path relative to work_tree in setup_work_tree(), 2008-06-19). But for the most part that optimization was just _undoing_ the relative-to-absolute conversion which the function was doing earlier (and which is now gone). It is true that if you already have an absolute git_dir that the setup_work_tree() function will no longer make it relative as a side effect. But: - we generally do have relative git-dir's due to the way the discovery code works - if we really care about making git-dir's relative when possible, then we should be relativizing them earlier (e.g., when we see an absolute $GIT_DIR we could turn it relative, whether we are going to chdir into a worktree or not). That would cover all cases, including ones that 044bbbcb63 did not. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-30 20:35:08 +02:00
const char *work_tree;
static int initialized = 0;
if (initialized)
return;
setup_git_directory: delay core.bare/core.worktree errors If both core.bare and core.worktree are set, we complain about the bogus config and die. Dying is good, because it avoids commands running and doing damage in a potentially incorrect setup. But dying _there_ is bad, because it means that commands which do not even care about the work tree cannot run. This can make repairing the situation harder: [setup] $ git config core.bare true $ git config core.worktree /some/path [OK, expected.] $ git status fatal: core.bare and core.worktree do not make sense [Hrm...] $ git config --unset core.worktree fatal: core.bare and core.worktree do not make sense [Nope...] $ git config --edit fatal: core.bare and core.worktree do not make sense [Gaaah.] $ git help config fatal: core.bare and core.worktree do not make sense Instead, let's issue a warning about the bogus config when we notice it (i.e., for all commands), but only die when the command tries to use the work tree (by calling setup_work_tree). So we now get: $ git status warning: core.bare and core.worktree do not make sense fatal: unable to set up work tree using invalid config $ git config --unset core.worktree warning: core.bare and core.worktree do not make sense We have to update t1510 to accomodate this; it uses symbolic-ref to check whether the configuration works or not, but of course that command does not use the working tree. Instead, we switch it to use `git status`, as it requires a work-tree, does not need any special setup, and is read-only (so a failure will not adversely affect further tests). In addition, we add a new test that checks the desired behavior (i.e., that running "git config" with the bogus config does in fact work). Reported-by: SZEDER Gábor <szeder@ira.uka.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-29 08:49:10 +02:00
if (work_tree_config_is_bogus)
die(_("unable to set up work tree using invalid config"));
setup_git_directory: delay core.bare/core.worktree errors If both core.bare and core.worktree are set, we complain about the bogus config and die. Dying is good, because it avoids commands running and doing damage in a potentially incorrect setup. But dying _there_ is bad, because it means that commands which do not even care about the work tree cannot run. This can make repairing the situation harder: [setup] $ git config core.bare true $ git config core.worktree /some/path [OK, expected.] $ git status fatal: core.bare and core.worktree do not make sense [Hrm...] $ git config --unset core.worktree fatal: core.bare and core.worktree do not make sense [Nope...] $ git config --edit fatal: core.bare and core.worktree do not make sense [Gaaah.] $ git help config fatal: core.bare and core.worktree do not make sense Instead, let's issue a warning about the bogus config when we notice it (i.e., for all commands), but only die when the command tries to use the work tree (by calling setup_work_tree). So we now get: $ git status warning: core.bare and core.worktree do not make sense fatal: unable to set up work tree using invalid config $ git config --unset core.worktree warning: core.bare and core.worktree do not make sense We have to update t1510 to accomodate this; it uses symbolic-ref to check whether the configuration works or not, but of course that command does not use the working tree. Instead, we switch it to use `git status`, as it requires a work-tree, does not need any special setup, and is read-only (so a failure will not adversely affect further tests). In addition, we add a new test that checks the desired behavior (i.e., that running "git config" with the bogus config does in fact work). Reported-by: SZEDER Gábor <szeder@ira.uka.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-29 08:49:10 +02:00
work_tree = get_git_work_tree();
set_work_tree: use chdir_notify When we change to the top of the working tree, we manually re-adjust $GIT_DIR and call set_git_dir() again, in order to update any relative git-dir we'd compute earlier. Instead of the work-tree code having to know to call the git-dir code, let's use the new chdir_notify interface. There are two spots that need updating, with a few subtleties in each: 1. the set_git_dir() code needs to chdir_notify_register() so it can be told when to update its path. Technically we could push this down into repo_set_gitdir(), so that even repository structs besides the_repository could benefit from this. But that opens up a lot of complications: - we'd still need to touch set_git_dir(), because it does some other setup (like setting $GIT_DIR in the environment) - submodules using other repository structs get cleaned up, which means we'd need to remove them from the chdir_notify list - it's unlikely to fix any bugs, since we shouldn't generally chdir() in the middle of working on a submodule 2. setup_work_tree now needs to call chdir_notify(), and can lose its manual set_git_dir() call. Note that at first glance it looks like this undoes the absolute-to-relative optimization added by 044bbbcb63 (Make git_dir a path relative to work_tree in setup_work_tree(), 2008-06-19). But for the most part that optimization was just _undoing_ the relative-to-absolute conversion which the function was doing earlier (and which is now gone). It is true that if you already have an absolute git_dir that the setup_work_tree() function will no longer make it relative as a side effect. But: - we generally do have relative git-dir's due to the way the discovery code works - if we really care about making git-dir's relative when possible, then we should be relativizing them earlier (e.g., when we see an absolute $GIT_DIR we could turn it relative, whether we are going to chdir into a worktree or not). That would cover all cases, including ones that 044bbbcb63 did not. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-30 20:35:08 +02:00
if (!work_tree || chdir_notify(work_tree))
die(_("this operation must be run in a work tree"));
/*
* Make sure subsequent git processes find correct worktree
* if $GIT_WORK_TREE is set relative
*/
if (getenv(GIT_WORK_TREE_ENVIRONMENT))
setenv(GIT_WORK_TREE_ENVIRONMENT, ".", 1);
initialized = 1;
}
static void setup_original_cwd(void)
{
struct strbuf tmp = STRBUF_INIT;
const char *worktree = NULL;
int offset = -1;
if (!tmp_original_cwd)
return;
/*
* startup_info->original_cwd points to the current working
* directory we inherited from our parent process, which is a
* directory we want to avoid removing.
*
* For convience, we would like to have the path relative to the
* worktree instead of an absolute path.
*
* Yes, startup_info->original_cwd is usually the same as 'prefix',
* but differs in two ways:
* - prefix has a trailing '/'
* - if the user passes '-C' to git, that modifies the prefix but
* not startup_info->original_cwd.
*/
/* Normalize the directory */
if (!strbuf_realpath(&tmp, tmp_original_cwd, 0)) {
trace2_data_string("setup", the_repository,
"realpath-path", tmp_original_cwd);
trace2_data_string("setup", the_repository,
"realpath-failure", strerror(errno));
free((char*)tmp_original_cwd);
tmp_original_cwd = NULL;
return;
}
free((char*)tmp_original_cwd);
tmp_original_cwd = NULL;
startup_info->original_cwd = strbuf_detach(&tmp, NULL);
/*
* Get our worktree; we only protect the current working directory
* if it's in the worktree.
*/
worktree = get_git_work_tree();
if (!worktree)
goto no_prevention_needed;
offset = dir_inside_of(startup_info->original_cwd, worktree);
if (offset >= 0) {
/*
* If startup_info->original_cwd == worktree, that is already
* protected and we don't need original_cwd as a secondary
* protection measure.
*/
if (!*(startup_info->original_cwd + offset))
goto no_prevention_needed;
/*
* original_cwd was inside worktree; precompose it just as
* we do prefix so that built up paths will match
*/
startup_info->original_cwd = \
precompose_string_if_needed(startup_info->original_cwd
+ offset);
return;
}
no_prevention_needed:
free((char*)startup_info->original_cwd);
startup_info->original_cwd = NULL;
}
config: add ctx arg to config_fn_t Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 21:26:22 +02:00
static int read_worktree_config(const char *var, const char *value,
const struct config_context *ctx UNUSED,
void *vdata)
worktree: add per-worktree config files A new repo extension is added, worktreeConfig. When it is present: - Repository config reading by default includes $GIT_DIR/config _and_ $GIT_DIR/config.worktree. "config" file remains shared in multiple worktree setup. - The special treatment for core.bare and core.worktree, to stay effective only in main worktree, is gone. These config settings are supposed to be in config.worktree. This extension is most useful in multiple worktree setup because you now have an option to store per-worktree config (which is either .git/config.worktree for main worktree, or .git/worktrees/xx/config.worktree for linked ones). This extension can be used in single worktree mode, even though it's pretty much useless (but this can happen after you remove all linked worktrees and move back to single worktree). "git config" reads from both "config" and "config.worktree" by default (i.e. without either --user, --file...) when this extension is present. Default writes still go to "config", not "config.worktree". A new option --worktree is added for that (*). Since a new repo extension is introduced, existing git binaries should refuse to access to the repo (both from main and linked worktrees). So they will not misread the config file (i.e. skip the config.worktree part). They may still accidentally write to the config file anyway if they use with "git config --file <path>". This design places a bet on the assumption that the majority of config variables are shared so it is the default mode. A safer move would be default writes go to per-worktree file, so that accidental changes are isolated. (*) "git config --worktree" points back to "config" file when this extension is not present and there is only one worktree so that it works in any both single and multiple worktree setups. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 16:02:28 +02:00
{
struct repository_format *data = vdata;
if (strcmp(var, "core.bare") == 0) {
data->is_bare = git_config_bool(var, value);
} else if (strcmp(var, "core.worktree") == 0) {
if (!value)
return config_error_nonbool(var);
free(data->work_tree);
worktree: add per-worktree config files A new repo extension is added, worktreeConfig. When it is present: - Repository config reading by default includes $GIT_DIR/config _and_ $GIT_DIR/config.worktree. "config" file remains shared in multiple worktree setup. - The special treatment for core.bare and core.worktree, to stay effective only in main worktree, is gone. These config settings are supposed to be in config.worktree. This extension is most useful in multiple worktree setup because you now have an option to store per-worktree config (which is either .git/config.worktree for main worktree, or .git/worktrees/xx/config.worktree for linked ones). This extension can be used in single worktree mode, even though it's pretty much useless (but this can happen after you remove all linked worktrees and move back to single worktree). "git config" reads from both "config" and "config.worktree" by default (i.e. without either --user, --file...) when this extension is present. Default writes still go to "config", not "config.worktree". A new option --worktree is added for that (*). Since a new repo extension is introduced, existing git binaries should refuse to access to the repo (both from main and linked worktrees). So they will not misread the config file (i.e. skip the config.worktree part). They may still accidentally write to the config file anyway if they use with "git config --file <path>". This design places a bet on the assumption that the majority of config variables are shared so it is the default mode. A safer move would be default writes go to per-worktree file, so that accidental changes are isolated. (*) "git config --worktree" points back to "config" file when this extension is not present and there is only one worktree so that it works in any both single and multiple worktree setups. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 16:02:28 +02:00
data->work_tree = xstrdup(value);
}
return 0;
}
verify_repository_format(): complain about new extensions in v0 repo We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:25:13 +02:00
enum extension_result {
EXTENSION_ERROR = -1, /* compatible with error(), etc */
EXTENSION_UNKNOWN = 0,
EXTENSION_OK = 1
};
/*
* Do not add new extensions to this function. It handles extensions which are
* respected even in v0-format repositories for historical compatibility.
*/
static enum extension_result handle_extension_v0(const char *var,
const char *value,
const char *ext,
struct repository_format *data)
{
if (!strcmp(ext, "noop")) {
return EXTENSION_OK;
} else if (!strcmp(ext, "preciousobjects")) {
data->precious_objects = git_config_bool(var, value);
return EXTENSION_OK;
} else if (!strcmp(ext, "partialclone")) {
if (!value)
return config_error_nonbool(var);
verify_repository_format(): complain about new extensions in v0 repo We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:25:13 +02:00
data->partial_clone = xstrdup(value);
return EXTENSION_OK;
} else if (!strcmp(ext, "worktreeconfig")) {
data->worktree_config = git_config_bool(var, value);
return EXTENSION_OK;
}
return EXTENSION_UNKNOWN;
}
/*
* Record any new extensions in this function.
*/
static enum extension_result handle_extension(const char *var,
const char *value,
const char *ext,
struct repository_format *data)
{
if (!strcmp(ext, "noop-v1")) {
return EXTENSION_OK;
} else if (!strcmp(ext, "objectformat")) {
int format;
verify_repository_format(): complain about new extensions in v0 repo We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:25:13 +02:00
if (!value)
return config_error_nonbool(var);
format = hash_algo_by_name(value);
if (format == GIT_HASH_UNKNOWN)
return error(_("invalid value for '%s': '%s'"),
"extensions.objectformat", value);
data->hash_algo = format;
return EXTENSION_OK;
} else if (!strcmp(ext, "compatobjectformat")) {
struct string_list_item *item;
int format;
if (!value)
return config_error_nonbool(var);
format = hash_algo_by_name(value);
if (format == GIT_HASH_UNKNOWN)
return error(_("invalid value for '%s': '%s'"),
"extensions.compatobjectformat", value);
/* For now only support compatObjectFormat being specified once. */
for_each_string_list_item(item, &data->v1_only_extensions) {
if (!strcmp(item->string, "compatobjectformat"))
return error(_("'%s' already specified as '%s'"),
"extensions.compatobjectformat",
hash_algos[data->compat_hash_algo].name);
}
data->compat_hash_algo = format;
return EXTENSION_OK;
setup: introduce "extensions.refStorage" extension Introduce a new "extensions.refStorage" extension that allows us to specify the ref storage format used by a repository. For now, the only supported format is the "files" format, but this list will likely soon be extended to also support the upcoming "reftable" format. There have been discussions on the Git mailing list in the past around how exactly this extension should look like. One alternative [1] that was discussed was whether it would make sense to model the extension in such a way that backends are arbitrarily stackable. This would allow for a combined value of e.g. "loose,packed-refs" or "loose,reftable", which indicates that new refs would be written via "loose" files backend and compressed into "packed-refs" or "reftable" backends, respectively. It is arguable though whether this flexibility and the complexity that it brings with it is really required for now. It is not foreseeable that there will be a proliferation of backends in the near-term future, and the current set of existing formats and formats which are on the horizon can easily be configured with the much simpler proposal where we have a single value, only. Furthermore, if we ever see that we indeed want to gain the ability to arbitrarily stack the ref formats, then we can adapt the current extension rather easily. Given that Git clients will refuse any unknown value for the "extensions.refStorage" extension they would also know to ignore a stacked "loose,packed-refs" in the future. So let's stick with the easy proposal for the time being and wire up the extension. [1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-29 08:26:47 +01:00
} else if (!strcmp(ext, "refstorage")) {
unsigned int format;
if (!value)
return config_error_nonbool(var);
format = ref_storage_format_by_name(value);
if (format == REF_STORAGE_FORMAT_UNKNOWN)
return error(_("invalid value for '%s': '%s'"),
"extensions.refstorage", value);
data->ref_storage_format = format;
return EXTENSION_OK;
}
verify_repository_format(): complain about new extensions in v0 repo We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:25:13 +02:00
return EXTENSION_UNKNOWN;
}
config: add ctx arg to config_fn_t Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 21:26:22 +02:00
static int check_repo_format(const char *var, const char *value,
const struct config_context *ctx, void *vdata)
{
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
struct repository_format *data = vdata;
introduce "extensions" form of core.repositoryformatversion Normally we try to avoid bumps of the whole-repository core.repositoryformatversion field. However, it is unavoidable if we want to safely change certain aspects of git in a backwards-incompatible way (e.g., modifying the set of ref tips that we must traverse to generate a list of unreachable, safe-to-prune objects). If we were to bump the repository version for every such change, then any implementation understanding version `X` would also have to understand `X-1`, `X-2`, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature). This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions. This can be used, for example: - to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children) - that the refs are stored in a format besides the usual "refs" and "packed-refs" directories Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats. For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db". Older versions of git will not understand format "1" and bail. Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run. This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read. Note that we are only defining the rules for format 1 here. We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
const char *ext;
if (strcmp(var, "core.repositoryformatversion") == 0)
data->version = git_config_int(var, value, ctx->kvi);
introduce "extensions" form of core.repositoryformatversion Normally we try to avoid bumps of the whole-repository core.repositoryformatversion field. However, it is unavoidable if we want to safely change certain aspects of git in a backwards-incompatible way (e.g., modifying the set of ref tips that we must traverse to generate a list of unreachable, safe-to-prune objects). If we were to bump the repository version for every such change, then any implementation understanding version `X` would also have to understand `X-1`, `X-2`, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature). This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions. This can be used, for example: - to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children) - that the refs are stored in a format besides the usual "refs" and "packed-refs" directories Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats. For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db". Older versions of git will not understand format "1" and bail. Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run. This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read. Note that we are only defining the rules for format 1 here. We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
else if (skip_prefix(var, "extensions.", &ext)) {
verify_repository_format(): complain about new extensions in v0 repo We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:25:13 +02:00
switch (handle_extension_v0(var, value, ext, data)) {
case EXTENSION_ERROR:
return -1;
case EXTENSION_OK:
return 0;
case EXTENSION_UNKNOWN:
break;
}
switch (handle_extension(var, value, ext, data)) {
case EXTENSION_ERROR:
return -1;
case EXTENSION_OK:
string_list_append(&data->v1_only_extensions, ext);
return 0;
case EXTENSION_UNKNOWN:
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
string_list_append(&data->unknown_extensions, ext);
verify_repository_format(): complain about new extensions in v0 repo We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:25:13 +02:00
return 0;
}
introduce "extensions" form of core.repositoryformatversion Normally we try to avoid bumps of the whole-repository core.repositoryformatversion field. However, it is unavoidable if we want to safely change certain aspects of git in a backwards-incompatible way (e.g., modifying the set of ref tips that we must traverse to generate a list of unreachable, safe-to-prune objects). If we were to bump the repository version for every such change, then any implementation understanding version `X` would also have to understand `X-1`, `X-2`, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature). This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions. This can be used, for example: - to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children) - that the refs are stored in a format besides the usual "refs" and "packed-refs" directories Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats. For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db". Older versions of git will not understand format "1" and bail. Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run. This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read. Note that we are only defining the rules for format 1 here. We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
}
worktree: add per-worktree config files A new repo extension is added, worktreeConfig. When it is present: - Repository config reading by default includes $GIT_DIR/config _and_ $GIT_DIR/config.worktree. "config" file remains shared in multiple worktree setup. - The special treatment for core.bare and core.worktree, to stay effective only in main worktree, is gone. These config settings are supposed to be in config.worktree. This extension is most useful in multiple worktree setup because you now have an option to store per-worktree config (which is either .git/config.worktree for main worktree, or .git/worktrees/xx/config.worktree for linked ones). This extension can be used in single worktree mode, even though it's pretty much useless (but this can happen after you remove all linked worktrees and move back to single worktree). "git config" reads from both "config" and "config.worktree" by default (i.e. without either --user, --file...) when this extension is present. Default writes still go to "config", not "config.worktree". A new option --worktree is added for that (*). Since a new repo extension is introduced, existing git binaries should refuse to access to the repo (both from main and linked worktrees). So they will not misread the config file (i.e. skip the config.worktree part). They may still accidentally write to the config file anyway if they use with "git config --file <path>". This design places a bet on the assumption that the majority of config variables are shared so it is the default mode. A safer move would be default writes go to per-worktree file, so that accidental changes are isolated. (*) "git config --worktree" points back to "config" file when this extension is not present and there is only one worktree so that it works in any both single and multiple worktree setups. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 16:02:28 +02:00
config: add ctx arg to config_fn_t Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 21:26:22 +02:00
return read_worktree_config(var, value, ctx, vdata);
}
static int check_repository_format_gently(const char *gitdir, struct repository_format *candidate, int *nongit_ok)
{
struct strbuf sb = STRBUF_INIT;
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
struct strbuf err = STRBUF_INIT;
int has_common;
introduce "extensions" form of core.repositoryformatversion Normally we try to avoid bumps of the whole-repository core.repositoryformatversion field. However, it is unavoidable if we want to safely change certain aspects of git in a backwards-incompatible way (e.g., modifying the set of ref tips that we must traverse to generate a list of unreachable, safe-to-prune objects). If we were to bump the repository version for every such change, then any implementation understanding version `X` would also have to understand `X-1`, `X-2`, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature). This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions. This can be used, for example: - to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children) - that the refs are stored in a format besides the usual "refs" and "packed-refs" directories Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats. For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db". Older versions of git will not understand format "1" and bail. Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run. This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read. Note that we are only defining the rules for format 1 here. We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
has_common = get_common_dir(&sb, gitdir);
strbuf_addstr(&sb, "/config");
read_repository_format(candidate, sb.buf);
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
strbuf_release(&sb);
/*
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
* For historical use of check_repository_format() in git-init,
* we treat a missing config as a silent "ok", even when nongit_ok
* is unset.
*/
if (candidate->version < 0)
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
return 0;
if (verify_repository_format(candidate, &err) < 0) {
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
if (nongit_ok) {
warning("%s", err.buf);
strbuf_release(&err);
*nongit_ok = -1;
return -1;
}
die("%s", err.buf);
}
Revert "check_repository_format_gently(): refuse extensions for old repositories" This reverts commit 14c7fa269e42df4133edd9ae7763b678ed6594cd. The core.repositoryFormatVersion field was introduced in ab9cb76f661 (Repository format version check., 2005-11-25), providing a welcome bit of forward compatibility, thanks to some welcome analysis by Martin Atukunda. The semantics are simple: a repository with core.repositoryFormatVersion set to 0 should be comprehensible by all Git implementations in active use; and Git implementations should error out early instead of trying to act on Git repositories with higher core.repositoryFormatVersion values representing new formats that they do not understand. A new repository format did not need to be defined until 00a09d57eb8 (introduce "extensions" form of core.repositoryformatversion, 2015-06-23). This provided a finer-grained extension mechanism for Git repositories. In a repository with core.repositoryFormatVersion set to 1, Git implementations can act on "extensions.*" settings that modify how a repository is interpreted. In repository format version 1, unrecognized extensions settings cause Git to error out. What happens if a user sets an extension setting but forgets to increase the repository format version to 1? The extension settings were still recognized in that case; worse, unrecognized extensions settings do *not* cause Git to error out. So combining repository format version 0 with extensions settings produces in some sense the worst of both worlds. To improve that situation, since 14c7fa269e4 (check_repository_format_gently(): refuse extensions for old repositories, 2020-06-05) Git instead ignores extensions in v0 mode. This way, v0 repositories get the historical (pre-2015) behavior and maintain compatibility with Git implementations that do not know about the v1 format. Unfortunately, users had been using this sort of configuration and this behavior change came to many as a surprise: - users of "git config --worktree" that had followed its advice to enable extensions.worktreeConfig (without also increasing the repository format version) would find their worktree configuration no longer taking effect - tools such as copybara[*] that had set extensions.partialClone in existing repositories (without also increasing the repository format version) would find that setting no longer taking effect The behavior introduced in 14c7fa269e4 might be a good behavior if we were traveling back in time to 2015, but we're far too late. For some reason I thought that it was what had been originally implemented and that it had regressed. Apologies for not doing my research when 14c7fa269e4 was under development. Let's return to the behavior we've had since 2015: always act on extensions.* settings, regardless of repository format version. While we're here, include some tests to describe the effect on the "upgrade repository version" code path. [*] https://github.com/google/copybara/commit/ca76c0b1e13c4e36448d12c2aba4a5d9d98fb6e7 Reported-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 08:24:29 +02:00
repository_format_precious_objects = candidate->precious_objects;
string_list_clear(&candidate->unknown_extensions, 0);
verify_repository_format(): complain about new extensions in v0 repo We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:25:13 +02:00
string_list_clear(&candidate->v1_only_extensions, 0);
worktree: add per-worktree config files A new repo extension is added, worktreeConfig. When it is present: - Repository config reading by default includes $GIT_DIR/config _and_ $GIT_DIR/config.worktree. "config" file remains shared in multiple worktree setup. - The special treatment for core.bare and core.worktree, to stay effective only in main worktree, is gone. These config settings are supposed to be in config.worktree. This extension is most useful in multiple worktree setup because you now have an option to store per-worktree config (which is either .git/config.worktree for main worktree, or .git/worktrees/xx/config.worktree for linked ones). This extension can be used in single worktree mode, even though it's pretty much useless (but this can happen after you remove all linked worktrees and move back to single worktree). "git config" reads from both "config" and "config.worktree" by default (i.e. without either --user, --file...) when this extension is present. Default writes still go to "config", not "config.worktree". A new option --worktree is added for that (*). Since a new repo extension is introduced, existing git binaries should refuse to access to the repo (both from main and linked worktrees). So they will not misread the config file (i.e. skip the config.worktree part). They may still accidentally write to the config file anyway if they use with "git config --file <path>". This design places a bet on the assumption that the majority of config variables are shared so it is the default mode. A safer move would be default writes go to per-worktree file, so that accidental changes are isolated. (*) "git config --worktree" points back to "config" file when this extension is not present and there is only one worktree so that it works in any both single and multiple worktree setups. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 16:02:28 +02:00
if (candidate->worktree_config) {
worktree: add per-worktree config files A new repo extension is added, worktreeConfig. When it is present: - Repository config reading by default includes $GIT_DIR/config _and_ $GIT_DIR/config.worktree. "config" file remains shared in multiple worktree setup. - The special treatment for core.bare and core.worktree, to stay effective only in main worktree, is gone. These config settings are supposed to be in config.worktree. This extension is most useful in multiple worktree setup because you now have an option to store per-worktree config (which is either .git/config.worktree for main worktree, or .git/worktrees/xx/config.worktree for linked ones). This extension can be used in single worktree mode, even though it's pretty much useless (but this can happen after you remove all linked worktrees and move back to single worktree). "git config" reads from both "config" and "config.worktree" by default (i.e. without either --user, --file...) when this extension is present. Default writes still go to "config", not "config.worktree". A new option --worktree is added for that (*). Since a new repo extension is introduced, existing git binaries should refuse to access to the repo (both from main and linked worktrees). So they will not misread the config file (i.e. skip the config.worktree part). They may still accidentally write to the config file anyway if they use with "git config --file <path>". This design places a bet on the assumption that the majority of config variables are shared so it is the default mode. A safer move would be default writes go to per-worktree file, so that accidental changes are isolated. (*) "git config --worktree" points back to "config" file when this extension is not present and there is only one worktree so that it works in any both single and multiple worktree setups. Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-21 16:02:28 +02:00
/*
* pick up core.bare and core.worktree from per-worktree
* config if present
*/
strbuf_addf(&sb, "%s/config.worktree", gitdir);
git_config_from_file(read_worktree_config, sb.buf, candidate);
strbuf_release(&sb);
has_common = 0;
}
if (!has_common) {
if (candidate->is_bare != -1) {
is_bare_repository_cfg = candidate->is_bare;
if (is_bare_repository_cfg == 1)
inside_work_tree = -1;
}
if (candidate->work_tree) {
free(git_work_tree_cfg);
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
git_work_tree_cfg = xstrdup(candidate->work_tree);
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
inside_work_tree = -1;
}
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
}
return 0;
}
int upgrade_repository_format(int target_version)
{
struct strbuf sb = STRBUF_INIT;
struct strbuf err = STRBUF_INIT;
struct strbuf repo_version = STRBUF_INIT;
struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
int ret;
strbuf_git_common_path(&sb, the_repository, "config");
read_repository_format(&repo_fmt, sb.buf);
strbuf_release(&sb);
if (repo_fmt.version >= target_version) {
ret = 0;
goto out;
}
if (verify_repository_format(&repo_fmt, &err) < 0) {
ret = error("cannot upgrade repository format from %d to %d: %s",
repo_fmt.version, target_version, err.buf);
goto out;
}
if (!repo_fmt.version && repo_fmt.unknown_extensions.nr) {
ret = error("cannot upgrade repository format: "
"unknown extension %s",
repo_fmt.unknown_extensions.items[0].string);
goto out;
}
strbuf_addf(&repo_version, "%d", target_version);
git_config_set("core.repositoryformatversion", repo_version.buf);
ret = 1;
out:
clear_repository_format(&repo_fmt);
strbuf_release(&repo_version);
strbuf_release(&err);
return ret;
}
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
static void init_repository_format(struct repository_format *format)
{
const struct repository_format fresh = REPOSITORY_FORMAT_INIT;
memcpy(format, &fresh, sizeof(fresh));
}
int read_repository_format(struct repository_format *format, const char *path)
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
{
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
clear_repository_format(format);
git_config_from_file(check_repo_format, path, format);
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
if (format->version == -1)
clear_repository_format(format);
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
return format->version;
}
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
void clear_repository_format(struct repository_format *format)
{
string_list_clear(&format->unknown_extensions, 0);
verify_repository_format(): complain about new extensions in v0 repo We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:25:13 +02:00
string_list_clear(&format->v1_only_extensions, 0);
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
free(format->work_tree);
free(format->partial_clone);
init_repository_format(format);
}
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
int verify_repository_format(const struct repository_format *format,
struct strbuf *err)
{
if (GIT_REPO_VERSION_READ < format->version) {
strbuf_addf(err, _("Expected git repo version <= %d, found %d"),
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
GIT_REPO_VERSION_READ, format->version);
return -1;
}
if (format->version >= 1 && format->unknown_extensions.nr) {
introduce "extensions" form of core.repositoryformatversion Normally we try to avoid bumps of the whole-repository core.repositoryformatversion field. However, it is unavoidable if we want to safely change certain aspects of git in a backwards-incompatible way (e.g., modifying the set of ref tips that we must traverse to generate a list of unreachable, safe-to-prune objects). If we were to bump the repository version for every such change, then any implementation understanding version `X` would also have to understand `X-1`, `X-2`, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature). This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions. This can be used, for example: - to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children) - that the refs are stored in a format besides the usual "refs" and "packed-refs" directories Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats. For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db". Older versions of git will not understand format "1" and bail. Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run. This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read. Note that we are only defining the rules for format 1 here. We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
int i;
strbuf_addstr(err, Q_("unknown repository extension found:",
"unknown repository extensions found:",
format->unknown_extensions.nr));
introduce "extensions" form of core.repositoryformatversion Normally we try to avoid bumps of the whole-repository core.repositoryformatversion field. However, it is unavoidable if we want to safely change certain aspects of git in a backwards-incompatible way (e.g., modifying the set of ref tips that we must traverse to generate a list of unreachable, safe-to-prune objects). If we were to bump the repository version for every such change, then any implementation understanding version `X` would also have to understand `X-1`, `X-2`, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature). This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions. This can be used, for example: - to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children) - that the refs are stored in a format besides the usual "refs" and "packed-refs" directories Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats. For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db". Older versions of git will not understand format "1" and bail. Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run. This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read. Note that we are only defining the rules for format 1 here. We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
for (i = 0; i < format->unknown_extensions.nr; i++)
strbuf_addf(err, "\n\t%s",
format->unknown_extensions.items[i].string);
return -1;
introduce "extensions" form of core.repositoryformatversion Normally we try to avoid bumps of the whole-repository core.repositoryformatversion field. However, it is unavoidable if we want to safely change certain aspects of git in a backwards-incompatible way (e.g., modifying the set of ref tips that we must traverse to generate a list of unreachable, safe-to-prune objects). If we were to bump the repository version for every such change, then any implementation understanding version `X` would also have to understand `X-1`, `X-2`, and so forth, even though the incompatibilities may be in orthogonal parts of the system, and there is otherwise no reason we cannot implement one without the other (or more importantly, that the user cannot choose to use one feature without the other, weighing the tradeoff in compatibility only for that particular feature). This patch documents the existing repositoryformatversion strategy and introduces a new format, "1", which lets a repository specify that it must run with an arbitrary set of extensions. This can be used, for example: - to inform git that the objects should not be pruned based only on the reachability of the ref tips (e.g, because it has "clone --shared" children) - that the refs are stored in a format besides the usual "refs" and "packed-refs" directories Because we bump to format "1", and because format "1" requires that a running git knows about any extensions mentioned, we know that older versions of the code will not do something dangerous when confronted with these new formats. For example, if the user chooses to use database storage for refs, they may set the "extensions.refbackend" config to "db". Older versions of git will not understand format "1" and bail. Versions of git which understand "1" but do not know about "refbackend", or which know about "refbackend" but not about the "db" backend, will refuse to run. This is annoying, of course, but much better than the alternative of claiming that there are no refs in the repository, or writing to a location that other implementations will not read. Note that we are only defining the rules for format 1 here. We do not ever write format 1 ourselves; it is a tool that is meant to be used by users and future extensions to provide safety with older implementations. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 12:53:58 +02:00
}
verify_repository_format(): complain about new extensions in v0 repo We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:25:13 +02:00
if (format->version == 0 && format->v1_only_extensions.nr) {
int i;
strbuf_addstr(err,
Q_("repo version is 0, but v1-only extension found:",
"repo version is 0, but v1-only extensions found:",
format->v1_only_extensions.nr));
verify_repository_format(): complain about new extensions in v0 repo We made the mistake in the past of respecting extensions.* even when the repository format version was set to 0. This is bad because forgetting to bump the repository version means that older versions of Git (which do not know about our extensions) won't complain. I.e., it's not a problem in itself, but it means your repository is in a state which does not give you the protection you think you're getting from older versions. For compatibility reasons, we are stuck with that decision for existing extensions. However, we'd prefer not to extend the damage further. We can do that by catching any newly-added extensions and complaining about the repository format. Note that this is a pretty heavy hammer: we'll refuse to work with the repository at all. A lesser option would be to ignore (possibly with a warning) any new extensions. But because of the way the extensions are handled, that puts the burden on each new extension that is added to remember to "undo" itself (because they are handled before we know for sure whether we are in a v1 repo or not, since we don't insist on a particular ordering of config entries). So one option would be to rewrite that handling to record any new extensions (and their values) during the config parse, and then only after proceed to handle new ones only if we're in a v1 repository. But I'm not sure if it's worth the trouble: - ignoring extensions is likely to end up with broken results anyway (e.g., ignoring a proposed objectformat extension means parsing any object data is likely to encounter errors) - this is a sign that whatever tool wrote the extension field is broken. We may be better off notifying immediately and forcefully so that such tools don't even appear to work accidentally. The only downside is that fixing the situation is a little tricky, because programs like "git config" won't want to work with the repository. But: git config --file=.git/config core.repositoryformatversion 1 should still suffice. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-16 14:25:13 +02:00
for (i = 0; i < format->v1_only_extensions.nr; i++)
strbuf_addf(err, "\n\t%s",
format->v1_only_extensions.items[i].string);
return -1;
}
setup: refactor repo format reading and verification When we want to know if we're in a git repository of reasonable vintage, we can call check_repository_format_gently(), which does three things: 1. Reads the config from the .git/config file. 2. Verifies that the version info we read is sane. 3. Writes some global variables based on this. There are a few things we could improve here. One is that steps 1 and 3 happen together. So if the verification in step 2 fails, we still clobber the global variables. This is especially bad if we go on to try another repository directory; we may end up with a state of mixed config variables. The second is there's no way to ask about the repository version for anything besides the main repository we're in. git-init wants to do this, and it's possible that we would want to start doing so for submodules (e.g., to find out which ref backend they're using). We can improve both by splitting the first two steps into separate functions. Now check_repository_format_gently() calls out to steps 1 and 2, and does 3 only if step 2 succeeds. Note that the public interface for read_repository_format() and what check_repository_format_gently() needs from it are not quite the same, leading us to have an extra read_repository_format_1() helper. The extra needs from check_repository_format_gently() will go away in a future patch, and we can simplify this then to just the public interface. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-11 23:37:07 +01:00
return 0;
}
void read_gitfile_error_die(int error_code, const char *path, const char *dir)
{
switch (error_code) {
case READ_GITFILE_ERR_STAT_FAILED:
case READ_GITFILE_ERR_NOT_A_FILE:
/* non-fatal; follow return path */
break;
case READ_GITFILE_ERR_OPEN_FAILED:
die_errno(_("error opening '%s'"), path);
case READ_GITFILE_ERR_TOO_LARGE:
die(_("too large to be a .git file: '%s'"), path);
case READ_GITFILE_ERR_READ_FAILED:
die(_("error reading %s"), path);
case READ_GITFILE_ERR_INVALID_FORMAT:
die(_("invalid gitfile format: %s"), path);
case READ_GITFILE_ERR_NO_PATH:
die(_("no path in gitfile: %s"), path);
case READ_GITFILE_ERR_NOT_A_REPO:
die(_("not a git repository: %s"), dir);
default:
BUG("unknown error code");
}
}
/*
* Try to read the location of the git directory from the .git file,
* return path to git directory if found. The return value comes from
* a shared buffer.
*
* On failure, if return_error_code is not NULL, return_error_code
* will be set to an error code and NULL will be returned. If
* return_error_code is NULL the function will die instead (for most
* cases).
*/
const char *read_gitfile_gently(const char *path, int *return_error_code)
{
const int max_file_size = 1 << 20; /* 1MB */
int error_code = 0;
char *buf = NULL;
char *dir = NULL;
const char *slash;
struct stat st;
int fd;
ssize_t len;
static struct strbuf realpath = STRBUF_INIT;
if (stat(path, &st)) {
/* NEEDSWORK: discern between ENOENT vs other errors */
error_code = READ_GITFILE_ERR_STAT_FAILED;
goto cleanup_return;
}
if (!S_ISREG(st.st_mode)) {
error_code = READ_GITFILE_ERR_NOT_A_FILE;
goto cleanup_return;
}
if (st.st_size > max_file_size) {
error_code = READ_GITFILE_ERR_TOO_LARGE;
goto cleanup_return;
}
fd = open(path, O_RDONLY);
if (fd < 0) {
error_code = READ_GITFILE_ERR_OPEN_FAILED;
goto cleanup_return;
}
buf = xmallocz(st.st_size);
len = read_in_full(fd, buf, st.st_size);
close(fd);
if (len != st.st_size) {
error_code = READ_GITFILE_ERR_READ_FAILED;
goto cleanup_return;
}
if (!starts_with(buf, "gitdir: ")) {
error_code = READ_GITFILE_ERR_INVALID_FORMAT;
goto cleanup_return;
}
while (buf[len - 1] == '\n' || buf[len - 1] == '\r')
len--;
if (len < 9) {
error_code = READ_GITFILE_ERR_NO_PATH;
goto cleanup_return;
}
buf[len] = '\0';
dir = buf + 8;
if (!is_absolute_path(dir) && (slash = strrchr(path, '/'))) {
size_t pathlen = slash+1 - path;
dir = xstrfmt("%.*s%.*s", (int)pathlen, path,
(int)(len - 8), buf + 8);
free(buf);
buf = dir;
}
if (!is_git_directory(dir)) {
error_code = READ_GITFILE_ERR_NOT_A_REPO;
goto cleanup_return;
}
strbuf_realpath(&realpath, dir, 1);
path = realpath.buf;
cleanup_return:
if (return_error_code)
*return_error_code = error_code;
else if (error_code)
read_gitfile_error_die(error_code, path, dir);
free(buf);
return error_code ? NULL : path;
}
static const char *setup_explicit_git_dir(const char *gitdirenv,
struct strbuf *cwd,
struct repository_format *repo_fmt,
int *nongit_ok)
{
const char *work_tree_env = getenv(GIT_WORK_TREE_ENVIRONMENT);
const char *worktree;
char *gitfile;
int offset;
if (PATH_MAX - 40 < strlen(gitdirenv))
die(_("'$%s' too big"), GIT_DIR_ENVIRONMENT);
gitfile = (char*)read_gitfile(gitdirenv);
if (gitfile) {
gitfile = xstrdup(gitfile);
gitdirenv = gitfile;
}
if (!is_git_directory(gitdirenv)) {
if (nongit_ok) {
*nongit_ok = 1;
free(gitfile);
return NULL;
}
die(_("not a git repository: '%s'"), gitdirenv);
}
if (check_repository_format_gently(gitdirenv, repo_fmt, nongit_ok)) {
free(gitfile);
return NULL;
}
/* #3, #7, #11, #15, #19, #23, #27, #31 (see t1510) */
if (work_tree_env)
set_git_work_tree(work_tree_env);
else if (is_bare_repository_cfg > 0) {
setup_git_directory: delay core.bare/core.worktree errors If both core.bare and core.worktree are set, we complain about the bogus config and die. Dying is good, because it avoids commands running and doing damage in a potentially incorrect setup. But dying _there_ is bad, because it means that commands which do not even care about the work tree cannot run. This can make repairing the situation harder: [setup] $ git config core.bare true $ git config core.worktree /some/path [OK, expected.] $ git status fatal: core.bare and core.worktree do not make sense [Hrm...] $ git config --unset core.worktree fatal: core.bare and core.worktree do not make sense [Nope...] $ git config --edit fatal: core.bare and core.worktree do not make sense [Gaaah.] $ git help config fatal: core.bare and core.worktree do not make sense Instead, let's issue a warning about the bogus config when we notice it (i.e., for all commands), but only die when the command tries to use the work tree (by calling setup_work_tree). So we now get: $ git status warning: core.bare and core.worktree do not make sense fatal: unable to set up work tree using invalid config $ git config --unset core.worktree warning: core.bare and core.worktree do not make sense We have to update t1510 to accomodate this; it uses symbolic-ref to check whether the configuration works or not, but of course that command does not use the working tree. Instead, we switch it to use `git status`, as it requires a work-tree, does not need any special setup, and is read-only (so a failure will not adversely affect further tests). In addition, we add a new test that checks the desired behavior (i.e., that running "git config" with the bogus config does in fact work). Reported-by: SZEDER Gábor <szeder@ira.uka.de> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-05-29 08:49:10 +02:00
if (git_work_tree_cfg) {
/* #22.2, #30 */
warning("core.bare and core.worktree do not make sense");
work_tree_config_is_bogus = 1;
}
/* #18, #26 */
set_git_dir(gitdirenv, 0);
free(gitfile);
return NULL;
}
else if (git_work_tree_cfg) { /* #6, #14 */
if (is_absolute_path(git_work_tree_cfg))
set_git_work_tree(git_work_tree_cfg);
else {
char *core_worktree;
if (chdir(gitdirenv))
die_errno(_("cannot chdir to '%s'"), gitdirenv);
if (chdir(git_work_tree_cfg))
die_errno(_("cannot chdir to '%s'"), git_work_tree_cfg);
core_worktree = xgetcwd();
if (chdir(cwd->buf))
die_errno(_("cannot come back to cwd"));
set_git_work_tree(core_worktree);
free(core_worktree);
}
}
setup: suppress implicit "." work-tree for bare repos If an explicit GIT_DIR is given without a working tree, we implicitly assume that the current working directory should be used as the working tree. E.g.,: GIT_DIR=/some/repo.git git status would compare against the cwd. Unfortunately, we fool this rule for sub-invocations of git by setting GIT_DIR internally ourselves. For example: git init foo cd foo/.git git status ;# fails, as we expect git config alias.st status git status ;# does not fail, but should What happens is that we run setup_git_directory when doing alias lookup (since we need to see the config), set GIT_DIR as a result, and then leave GIT_WORK_TREE blank (because we do not have one). Then when we actually run the status command, we do setup_git_directory again, which sees our explicit GIT_DIR and uses the cwd as an implicit worktree. It's tempting to argue that we should be suppressing that second invocation of setup_git_directory, as it could use the values we already found in memory. However, the problem still exists for sub-processes (e.g., if "git status" were an external command). You can see another example with the "--bare" option, which sets GIT_DIR explicitly. For example: git init foo cd foo/.git git status ;# fails git --bare status ;# does NOT fail We need some way of telling sub-processes "even though GIT_DIR is set, do not use cwd as an implicit working tree". We could do it by putting a special token into GIT_WORK_TREE, but the obvious choice (an empty string) has some portability problems. Instead, we add a new boolean variable, GIT_IMPLICIT_WORK_TREE, which suppresses the use of cwd as a working tree when GIT_DIR is set. We trigger the new variable when we know we are in a bare setting. The variable is left intentionally undocumented, as this is an internal detail (for now, anyway). If somebody comes up with a good alternate use for it, and once we are confident we have shaken any bugs out of it, we can consider promoting it further. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-08 10:32:22 +01:00
else if (!git_env_bool(GIT_IMPLICIT_WORK_TREE_ENVIRONMENT, 1)) {
/* #16d */
set_git_dir(gitdirenv, 0);
setup: suppress implicit "." work-tree for bare repos If an explicit GIT_DIR is given without a working tree, we implicitly assume that the current working directory should be used as the working tree. E.g.,: GIT_DIR=/some/repo.git git status would compare against the cwd. Unfortunately, we fool this rule for sub-invocations of git by setting GIT_DIR internally ourselves. For example: git init foo cd foo/.git git status ;# fails, as we expect git config alias.st status git status ;# does not fail, but should What happens is that we run setup_git_directory when doing alias lookup (since we need to see the config), set GIT_DIR as a result, and then leave GIT_WORK_TREE blank (because we do not have one). Then when we actually run the status command, we do setup_git_directory again, which sees our explicit GIT_DIR and uses the cwd as an implicit worktree. It's tempting to argue that we should be suppressing that second invocation of setup_git_directory, as it could use the values we already found in memory. However, the problem still exists for sub-processes (e.g., if "git status" were an external command). You can see another example with the "--bare" option, which sets GIT_DIR explicitly. For example: git init foo cd foo/.git git status ;# fails git --bare status ;# does NOT fail We need some way of telling sub-processes "even though GIT_DIR is set, do not use cwd as an implicit working tree". We could do it by putting a special token into GIT_WORK_TREE, but the obvious choice (an empty string) has some portability problems. Instead, we add a new boolean variable, GIT_IMPLICIT_WORK_TREE, which suppresses the use of cwd as a working tree when GIT_DIR is set. We trigger the new variable when we know we are in a bare setting. The variable is left intentionally undocumented, as this is an internal detail (for now, anyway). If somebody comes up with a good alternate use for it, and once we are confident we have shaken any bugs out of it, we can consider promoting it further. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-08 10:32:22 +01:00
free(gitfile);
return NULL;
}
else /* #2, #10 */
set_git_work_tree(".");
/* set_git_work_tree() must have been called by now */
worktree = get_git_work_tree();
/* both get_git_work_tree() and cwd are already normalized */
if (!strcmp(cwd->buf, worktree)) { /* cwd == worktree */
set_git_dir(gitdirenv, 0);
free(gitfile);
return NULL;
}
offset = dir_inside_of(cwd->buf, worktree);
if (offset >= 0) { /* cwd inside worktree? */
set_git_dir(gitdirenv, 1);
if (chdir(worktree))
die_errno(_("cannot chdir to '%s'"), worktree);
strbuf_addch(cwd, '/');
free(gitfile);
return cwd->buf + offset;
}
/* cwd outside worktree */
set_git_dir(gitdirenv, 0);
free(gitfile);
return NULL;
}
static const char *setup_discovered_git_dir(const char *gitdir,
struct strbuf *cwd, int offset,
struct repository_format *repo_fmt,
int *nongit_ok)
{
if (check_repository_format_gently(gitdir, repo_fmt, nongit_ok))
return NULL;
/* --work-tree is set without --git-dir; use discovered one */
if (getenv(GIT_WORK_TREE_ENVIRONMENT) || git_work_tree_cfg) {
char *to_free = NULL;
const char *ret;
if (offset != cwd->len && !is_absolute_path(gitdir))
gitdir = to_free = real_pathdup(gitdir, 1);
if (chdir(cwd->buf))
die_errno(_("cannot come back to cwd"));
ret = setup_explicit_git_dir(gitdir, cwd, repo_fmt, nongit_ok);
free(to_free);
return ret;
}
/* #16.2, #17.2, #20.2, #21.2, #24, #25, #28, #29 (see t1510) */
if (is_bare_repository_cfg > 0) {
set_git_dir(gitdir, (offset != cwd->len));
if (chdir(cwd->buf))
die_errno(_("cannot come back to cwd"));
return NULL;
}
/* #0, #1, #5, #8, #9, #12, #13 */
set_git_work_tree(".");
if (strcmp(gitdir, DEFAULT_GIT_DIR_ENVIRONMENT))
set_git_dir(gitdir, 0);
inside_git_dir = 0;
inside_work_tree = 1;
if (offset >= cwd->len)
return NULL;
/* Make "offset" point past the '/' (already the case for root dirs) */
if (offset != offset_1st_component(cwd->buf))
offset++;
/* Add a '/' at the end */
strbuf_addch(cwd, '/');
return cwd->buf + offset;
}
/* #16.1, #17.1, #20.1, #21.1, #22.1 (see t1510) */
static const char *setup_bare_git_dir(struct strbuf *cwd, int offset,
struct repository_format *repo_fmt,
int *nongit_ok)
{
int root_len;
if (check_repository_format_gently(".", repo_fmt, nongit_ok))
return NULL;
setup: suppress implicit "." work-tree for bare repos If an explicit GIT_DIR is given without a working tree, we implicitly assume that the current working directory should be used as the working tree. E.g.,: GIT_DIR=/some/repo.git git status would compare against the cwd. Unfortunately, we fool this rule for sub-invocations of git by setting GIT_DIR internally ourselves. For example: git init foo cd foo/.git git status ;# fails, as we expect git config alias.st status git status ;# does not fail, but should What happens is that we run setup_git_directory when doing alias lookup (since we need to see the config), set GIT_DIR as a result, and then leave GIT_WORK_TREE blank (because we do not have one). Then when we actually run the status command, we do setup_git_directory again, which sees our explicit GIT_DIR and uses the cwd as an implicit worktree. It's tempting to argue that we should be suppressing that second invocation of setup_git_directory, as it could use the values we already found in memory. However, the problem still exists for sub-processes (e.g., if "git status" were an external command). You can see another example with the "--bare" option, which sets GIT_DIR explicitly. For example: git init foo cd foo/.git git status ;# fails git --bare status ;# does NOT fail We need some way of telling sub-processes "even though GIT_DIR is set, do not use cwd as an implicit working tree". We could do it by putting a special token into GIT_WORK_TREE, but the obvious choice (an empty string) has some portability problems. Instead, we add a new boolean variable, GIT_IMPLICIT_WORK_TREE, which suppresses the use of cwd as a working tree when GIT_DIR is set. We trigger the new variable when we know we are in a bare setting. The variable is left intentionally undocumented, as this is an internal detail (for now, anyway). If somebody comes up with a good alternate use for it, and once we are confident we have shaken any bugs out of it, we can consider promoting it further. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-08 10:32:22 +01:00
setenv(GIT_IMPLICIT_WORK_TREE_ENVIRONMENT, "0", 1);
/* --work-tree is set without --git-dir; use discovered one */
if (getenv(GIT_WORK_TREE_ENVIRONMENT) || git_work_tree_cfg) {
static const char *gitdir;
gitdir = offset == cwd->len ? "." : xmemdupz(cwd->buf, offset);
if (chdir(cwd->buf))
die_errno(_("cannot come back to cwd"));
return setup_explicit_git_dir(gitdir, cwd, repo_fmt, nongit_ok);
}
inside_git_dir = 1;
inside_work_tree = 0;
if (offset != cwd->len) {
if (chdir(cwd->buf))
die_errno(_("cannot come back to cwd"));
root_len = offset_1st_component(cwd->buf);
strbuf_setlen(cwd, offset > root_len ? offset : root_len);
set_git_dir(cwd->buf, 0);
}
else
set_git_dir(".", 0);
return NULL;
}
static dev_t get_device_or_die(const char *path, const char *prefix, int prefix_len)
{
struct stat buf;
if (stat(path, &buf)) {
die_errno(_("failed to stat '%*s%s%s'"),
prefix_len,
prefix ? prefix : "",
prefix ? "/" : "", path);
}
return buf.st_dev;
}
longest_ancestor_length(): require prefix list entries to be normalized Move the responsibility for normalizing prefixes from longest_ancestor_length() to its callers. Use slightly different normalizations at the two callers: In setup_git_directory_gently_1(), use the old normalization, which ignores paths that are not usable. In the next commit we will change this caller to also resolve symlinks in the paths from GIT_CEILING_DIRECTORIES as part of the normalization. In "test-path-utils longest_ancestor_length", use the old normalization, but die() if any paths are unusable. Also change t0060 to only pass normalized paths to the test program (no empty entries or non-absolute paths, strip trailing slashes from the paths, and remove tests that thereby become redundant). The point of this change is to reduce the scope of the ancestor_length tests in t0060 from testing normalization+longest_prefix to testing only mostly longest_prefix. This is necessary because when setup_git_directory_gently_1() starts resolving symlinks as part of its normalization, it will not be reasonable to do the same in the test suite, because that would make the test results depend on the contents of the root directory of the filesystem on which the test is run. HOWEVER: under Windows, bash mangles arguments that look like absolute POSIX paths into DOS paths. So we have to retain the level of normalization done by normalize_path_copy() to convert the bash-mangled DOS paths (which contain backslashes) into paths that use forward slashes. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>
2012-10-28 17:16:25 +01:00
/*
* A "string_list_each_func_t" function that canonicalizes an entry
* from GIT_CEILING_DIRECTORIES using real_pathdup(), or
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
* discards it if unusable. The presence of an empty entry in
* GIT_CEILING_DIRECTORIES turns off canonicalization for all
* subsequent entries.
longest_ancestor_length(): require prefix list entries to be normalized Move the responsibility for normalizing prefixes from longest_ancestor_length() to its callers. Use slightly different normalizations at the two callers: In setup_git_directory_gently_1(), use the old normalization, which ignores paths that are not usable. In the next commit we will change this caller to also resolve symlinks in the paths from GIT_CEILING_DIRECTORIES as part of the normalization. In "test-path-utils longest_ancestor_length", use the old normalization, but die() if any paths are unusable. Also change t0060 to only pass normalized paths to the test program (no empty entries or non-absolute paths, strip trailing slashes from the paths, and remove tests that thereby become redundant). The point of this change is to reduce the scope of the ancestor_length tests in t0060 from testing normalization+longest_prefix to testing only mostly longest_prefix. This is necessary because when setup_git_directory_gently_1() starts resolving symlinks as part of its normalization, it will not be reasonable to do the same in the test suite, because that would make the test results depend on the contents of the root directory of the filesystem on which the test is run. HOWEVER: under Windows, bash mangles arguments that look like absolute POSIX paths into DOS paths. So we have to retain the level of normalization done by normalize_path_copy() to convert the bash-mangled DOS paths (which contain backslashes) into paths that use forward slashes. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>
2012-10-28 17:16:25 +01:00
*/
static int canonicalize_ceiling_entry(struct string_list_item *item,
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
void *cb_data)
longest_ancestor_length(): require prefix list entries to be normalized Move the responsibility for normalizing prefixes from longest_ancestor_length() to its callers. Use slightly different normalizations at the two callers: In setup_git_directory_gently_1(), use the old normalization, which ignores paths that are not usable. In the next commit we will change this caller to also resolve symlinks in the paths from GIT_CEILING_DIRECTORIES as part of the normalization. In "test-path-utils longest_ancestor_length", use the old normalization, but die() if any paths are unusable. Also change t0060 to only pass normalized paths to the test program (no empty entries or non-absolute paths, strip trailing slashes from the paths, and remove tests that thereby become redundant). The point of this change is to reduce the scope of the ancestor_length tests in t0060 from testing normalization+longest_prefix to testing only mostly longest_prefix. This is necessary because when setup_git_directory_gently_1() starts resolving symlinks as part of its normalization, it will not be reasonable to do the same in the test suite, because that would make the test results depend on the contents of the root directory of the filesystem on which the test is run. HOWEVER: under Windows, bash mangles arguments that look like absolute POSIX paths into DOS paths. So we have to retain the level of normalization done by normalize_path_copy() to convert the bash-mangled DOS paths (which contain backslashes) into paths that use forward slashes. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>
2012-10-28 17:16:25 +01:00
{
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
int *empty_entry_found = cb_data;
char *ceil = item->string;
longest_ancestor_length(): require prefix list entries to be normalized Move the responsibility for normalizing prefixes from longest_ancestor_length() to its callers. Use slightly different normalizations at the two callers: In setup_git_directory_gently_1(), use the old normalization, which ignores paths that are not usable. In the next commit we will change this caller to also resolve symlinks in the paths from GIT_CEILING_DIRECTORIES as part of the normalization. In "test-path-utils longest_ancestor_length", use the old normalization, but die() if any paths are unusable. Also change t0060 to only pass normalized paths to the test program (no empty entries or non-absolute paths, strip trailing slashes from the paths, and remove tests that thereby become redundant). The point of this change is to reduce the scope of the ancestor_length tests in t0060 from testing normalization+longest_prefix to testing only mostly longest_prefix. This is necessary because when setup_git_directory_gently_1() starts resolving symlinks as part of its normalization, it will not be reasonable to do the same in the test suite, because that would make the test results depend on the contents of the root directory of the filesystem on which the test is run. HOWEVER: under Windows, bash mangles arguments that look like absolute POSIX paths into DOS paths. So we have to retain the level of normalization done by normalize_path_copy() to convert the bash-mangled DOS paths (which contain backslashes) into paths that use forward slashes. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>
2012-10-28 17:16:25 +01:00
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
if (!*ceil) {
*empty_entry_found = 1;
longest_ancestor_length(): require prefix list entries to be normalized Move the responsibility for normalizing prefixes from longest_ancestor_length() to its callers. Use slightly different normalizations at the two callers: In setup_git_directory_gently_1(), use the old normalization, which ignores paths that are not usable. In the next commit we will change this caller to also resolve symlinks in the paths from GIT_CEILING_DIRECTORIES as part of the normalization. In "test-path-utils longest_ancestor_length", use the old normalization, but die() if any paths are unusable. Also change t0060 to only pass normalized paths to the test program (no empty entries or non-absolute paths, strip trailing slashes from the paths, and remove tests that thereby become redundant). The point of this change is to reduce the scope of the ancestor_length tests in t0060 from testing normalization+longest_prefix to testing only mostly longest_prefix. This is necessary because when setup_git_directory_gently_1() starts resolving symlinks as part of its normalization, it will not be reasonable to do the same in the test suite, because that would make the test results depend on the contents of the root directory of the filesystem on which the test is run. HOWEVER: under Windows, bash mangles arguments that look like absolute POSIX paths into DOS paths. So we have to retain the level of normalization done by normalize_path_copy() to convert the bash-mangled DOS paths (which contain backslashes) into paths that use forward slashes. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>
2012-10-28 17:16:25 +01:00
return 0;
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
} else if (!is_absolute_path(ceil)) {
longest_ancestor_length(): require prefix list entries to be normalized Move the responsibility for normalizing prefixes from longest_ancestor_length() to its callers. Use slightly different normalizations at the two callers: In setup_git_directory_gently_1(), use the old normalization, which ignores paths that are not usable. In the next commit we will change this caller to also resolve symlinks in the paths from GIT_CEILING_DIRECTORIES as part of the normalization. In "test-path-utils longest_ancestor_length", use the old normalization, but die() if any paths are unusable. Also change t0060 to only pass normalized paths to the test program (no empty entries or non-absolute paths, strip trailing slashes from the paths, and remove tests that thereby become redundant). The point of this change is to reduce the scope of the ancestor_length tests in t0060 from testing normalization+longest_prefix to testing only mostly longest_prefix. This is necessary because when setup_git_directory_gently_1() starts resolving symlinks as part of its normalization, it will not be reasonable to do the same in the test suite, because that would make the test results depend on the contents of the root directory of the filesystem on which the test is run. HOWEVER: under Windows, bash mangles arguments that look like absolute POSIX paths into DOS paths. So we have to retain the level of normalization done by normalize_path_copy() to convert the bash-mangled DOS paths (which contain backslashes) into paths that use forward slashes. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>
2012-10-28 17:16:25 +01:00
return 0;
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
} else if (*empty_entry_found) {
/* Keep entry but do not canonicalize it */
return 1;
} else {
char *real_path = real_pathdup(ceil, 0);
if (!real_path) {
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
return 0;
}
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
free(item->string);
item->string = real_path;
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
return 1;
}
longest_ancestor_length(): require prefix list entries to be normalized Move the responsibility for normalizing prefixes from longest_ancestor_length() to its callers. Use slightly different normalizations at the two callers: In setup_git_directory_gently_1(), use the old normalization, which ignores paths that are not usable. In the next commit we will change this caller to also resolve symlinks in the paths from GIT_CEILING_DIRECTORIES as part of the normalization. In "test-path-utils longest_ancestor_length", use the old normalization, but die() if any paths are unusable. Also change t0060 to only pass normalized paths to the test program (no empty entries or non-absolute paths, strip trailing slashes from the paths, and remove tests that thereby become redundant). The point of this change is to reduce the scope of the ancestor_length tests in t0060 from testing normalization+longest_prefix to testing only mostly longest_prefix. This is necessary because when setup_git_directory_gently_1() starts resolving symlinks as part of its normalization, it will not be reasonable to do the same in the test suite, because that would make the test results depend on the contents of the root directory of the filesystem on which the test is run. HOWEVER: under Windows, bash mangles arguments that look like absolute POSIX paths into DOS paths. So we have to retain the level of normalization done by normalize_path_copy() to convert the bash-mangled DOS paths (which contain backslashes) into paths that use forward slashes. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Jeff King <peff@peff.net>
2012-10-28 17:16:25 +01:00
}
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
struct safe_directory_data {
const char *path;
int is_safe;
};
config: add ctx arg to config_fn_t Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 21:26:22 +02:00
static int safe_directory_cb(const char *key, const char *value,
const struct config_context *ctx UNUSED, void *d)
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
{
struct safe_directory_data *data = d;
if (strcmp(key, "safe.directory"))
return 0;
if (!value || !*value) {
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
data->is_safe = 0;
} else if (!strcmp(value, "*")) {
data->is_safe = 1;
} else {
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
const char *interpolated = NULL;
if (!git_config_pathname(&interpolated, key, value) &&
!fspathcmp(data->path, interpolated ? interpolated : value))
data->is_safe = 1;
free((char *)interpolated);
}
return 0;
}
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
/*
* Check if a repository is safe, by verifying the ownership of the
* worktree (if any), the git directory, and the gitfile (if any).
*
* Exemptions for known-safe repositories can be added via `safe.directory`
* config settings; for non-bare repositories, their worktree needs to be
* added, for bare ones their git directory.
*/
static int ensure_valid_ownership(const char *gitfile,
const char *worktree, const char *gitdir,
struct strbuf *report)
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
{
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
struct safe_directory_data data = {
.path = worktree ? worktree : gitdir
};
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
if (!git_env_bool("GIT_TEST_ASSUME_DIFFERENT_OWNER", 0) &&
(!gitfile || is_path_owned_by_current_user(gitfile, report)) &&
(!worktree || is_path_owned_by_current_user(worktree, report)) &&
(!gitdir || is_path_owned_by_current_user(gitdir, report)))
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
return 1;
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
/*
* data.path is the "path" that identifies the repository and it is
* constant regardless of what failed above. data.is_safe should be
* initialized to false, and might be changed by the callback.
*/
git_protected_config(safe_directory_cb, &data);
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
return data.is_safe;
}
config: add ctx arg to config_fn_t Add a new "const struct config_context *ctx" arg to config_fn_t to hold additional information about the config iteration operation. config_context has a "struct key_value_info kvi" member that holds metadata about the config source being read (e.g. what kind of config source it is, the filename, etc). In this series, we're only interested in .kvi, so we could have just used "struct key_value_info" as an arg, but config_context makes it possible to add/adjust members in the future without changing the config_fn_t signature. We could also consider other ways of organizing the args (e.g. moving the config name and value into config_context or key_value_info), but in my experiments, the incremental benefit doesn't justify the added complexity (e.g. a config_fn_t will sometimes invoke another config_fn_t but with a different config value). In subsequent commits, the .kvi member will replace the global "struct config_reader" in config.c, making config iteration a global-free operation. It requires much more work for the machinery to provide meaningful values of .kvi, so for now, merely change the signature and call sites, pass NULL as a placeholder value, and don't rely on the arg in any meaningful way. Most of the changes are performed by contrib/coccinelle/config_fn_ctx.pending.cocci, which, for every config_fn_t: - Modifies the signature to accept "const struct config_context *ctx" - Passes "ctx" to any inner config_fn_t, if needed - Adds UNUSED attributes to "ctx", if needed Most config_fn_t instances are easily identified by seeing if they are called by the various config functions. Most of the remaining ones are manually named in the .cocci patch. Manual cleanups are still needed, but the majority of it is trivial; it's either adjusting config_fn_t that the .cocci patch didn't catch, or adding forward declarations of "struct config_context ctx" to make the signatures make sense. The non-trivial changes are in cases where we are invoking a config_fn_t outside of config machinery, and we now need to decide what value of "ctx" to pass. These cases are: - trace2/tr2_cfg.c:tr2_cfg_set_fl() This is indirectly called by git_config_set() so that the trace2 machinery can notice the new config values and update its settings using the tr2 config parsing function, i.e. tr2_cfg_cb(). - builtin/checkout.c:checkout_main() This calls git_xmerge_config() as a shorthand for parsing a CLI arg. This might be worth refactoring away in the future, since git_xmerge_config() can call git_default_config(), which can do much more than just parsing. Handle them by creating a KVI_INIT macro that initializes "struct key_value_info" to a reasonable default, and use that to construct the "ctx" arg. Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-28 21:26:22 +02:00
static int allowed_bare_repo_cb(const char *key, const char *value,
const struct config_context *ctx UNUSED,
void *d)
setup.c: create `safe.bareRepository` There is a known social engineering attack that takes advantage of the fact that a working tree can include an entire bare repository, including a config file. A user could run a Git command inside the bare repository thinking that the config file of the 'outer' repository would be used, but in reality, the bare repository's config file (which is attacker-controlled) is used, which may result in arbitrary code execution. See [1] for a fuller description and deeper discussion. A simple mitigation is to forbid bare repositories unless specified via `--git-dir` or `GIT_DIR`. In environments that don't use bare repositories, this would be minimally disruptive. Create a config variable, `safe.bareRepository`, that tells Git whether or not to die() when working with a bare repository. This config is an enum of: - "all": allow all bare repositories (this is the default) - "explicit": only allow bare repositories specified via --git-dir or GIT_DIR. If we want to protect users from such attacks by default, neither value will suffice - "all" provides no protection, but "explicit" is impractical for bare repository users. A more usable default would be to allow only non-embedded bare repositories ([2] contains one such proposal), but detecting if a repository is embedded is potentially non-trivial, so this work is not implemented in this series. [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com [2]: https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@github.com Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14 23:28:01 +02:00
{
enum allowed_bare_repo *allowed_bare_repo = d;
if (strcasecmp(key, "safe.bareRepository"))
return 0;
if (!strcmp(value, "explicit")) {
*allowed_bare_repo = ALLOWED_BARE_REPO_EXPLICIT;
return 0;
}
if (!strcmp(value, "all")) {
*allowed_bare_repo = ALLOWED_BARE_REPO_ALL;
return 0;
}
return -1;
}
static enum allowed_bare_repo get_allowed_bare_repo(void)
{
enum allowed_bare_repo result = ALLOWED_BARE_REPO_ALL;
git_protected_config(allowed_bare_repo_cb, &result);
return result;
}
static const char *allowed_bare_repo_to_string(
enum allowed_bare_repo allowed_bare_repo)
{
switch (allowed_bare_repo) {
case ALLOWED_BARE_REPO_EXPLICIT:
return "explicit";
case ALLOWED_BARE_REPO_ALL:
return "all";
default:
BUG("invalid allowed_bare_repo %d",
allowed_bare_repo);
}
return NULL;
}
setup: notice more types of implicit bare repositories Setting the safe.bareRepository configuration variable to explicit stops git from using a bare repository, unless the repository is explicitly specified, either by the "--git-dir=<path>" command line option, or by exporting $GIT_DIR environment variable. This may be a reasonable measure to safeguard users from accidentally straying into a bare repository in unexpected places, but often gets in the way of users who need valid accesses to the repository. Earlier, 45bb9162 (setup: allow cwd=.git w/ bareRepository=explicit, 2024-01-20) loosened the rule such that being inside the ".git" directory of a non-bare repository does not really count as accessing a "bare" repository. The reason why such a loosening is needed is because often hooks and third-party tools run from within $GIT_DIR while working with a non-bare repository. More importantly, the reason why this is safe is because a directory whose contents look like that of a "bare" repository cannot be a bare repository that came embedded within a checkout of a malicious project, as long as its directory name is ".git", because ".git" is not a name allowed for a directory in payload. There are at least two other cases where tools have to work in a bare-repository looking directory that is not an embedded bare repository, and accesses to them are still not allowed by the recent change. - A secondary worktree (whose name is $name) has its $GIT_DIR inside "worktrees/$name/" subdirectory of the $GIT_DIR of the primary worktree of the same repository. - A submodule worktree (whose name is $name) has its $GIT_DIR inside "modules/$name/" subdirectory of the $GIT_DIR of its superproject. As long as the primary worktree or the superproject in these cases are not bare, the pathname of these "looks like bare but not really" directories will have "/.git/worktrees/" and "/.git/modules/" as a substring in its leading part, and we can take advantage of the same security guarantee allow git to work from these places. Extend the earlier "in a directory called '.git' we are OK" logic used for the primary worktree to also cover the secondary worktree's and non-embedded submodule's $GIT_DIR, by moving the logic to a helper function "is_implicit_bare_repo()". We deliberately exclude secondary worktrees and submodules of a bare repository, as these are exactly what safe.bareRepository=explicit setting is designed to forbid accesses to without an explicit GIT_DIR/--git-dir=<path> Helped-by: Kyle Lippincott <spectral@google.com> Helped-by: Kyle Meyer <kyle@kyleam.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-10 00:27:09 +01:00
static int is_implicit_bare_repo(const char *path)
{
/*
* what we found is a ".git" directory at the root of
* the working tree.
*/
if (ends_with_path_components(path, ".git"))
return 1;
/*
* we are inside $GIT_DIR of a secondary worktree of a
* non-bare repository.
*/
if (strstr(path, "/.git/worktrees/"))
return 1;
/*
* we are inside $GIT_DIR of a worktree of a non-embedded
* submodule, whose superproject is not a bare repository.
*/
if (strstr(path, "/.git/modules/"))
return 1;
return 0;
}
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
/*
* We cannot decide in this function whether we are in the work tree or
* not, since the config can only be read _after_ this function was called.
*
* Also, we avoid changing any global state (such as the current working
* directory) to allow early callers.
*
* The directory where the search should start needs to be passed in via the
* `dir` parameter; upon return, the `dir` buffer will contain the path of
* the directory where the search ended, and `gitdir` will contain the path of
* the discovered .git/ directory, if any. If `gitdir` is not absolute, it
* is relative to `dir` (i.e. *not* necessarily the cwd).
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
*/
static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
struct strbuf *gitdir,
struct strbuf *report,
int die_on_error)
{
const char *env_ceiling_dirs = getenv(CEILING_DIRECTORIES_ENVIRONMENT);
struct string_list ceiling_dirs = STRING_LIST_INIT_DUP;
const char *gitdirenv;
int ceil_offset = -1, min_offset = offset_1st_component(dir->buf);
dev_t current_device = 0;
int one_filesystem = 1;
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
/*
* If GIT_DIR is set explicitly, we're not going
* to do any discovery, but we still do repository
* validation.
*/
gitdirenv = getenv(GIT_DIR_ENVIRONMENT);
if (gitdirenv) {
strbuf_addstr(gitdir, gitdirenv);
return GIT_DIR_EXPLICIT;
}
if (env_ceiling_dirs) {
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
int empty_entry_found = 0;
string_list_split(&ceiling_dirs, env_ceiling_dirs, PATH_SEP, -1);
filter_string_list(&ceiling_dirs, 0,
Provide a mechanism to turn off symlink resolution in ceiling paths Commit 1b77d83cab 'setup_git_directory_gently_1(): resolve symlinks in ceiling paths' changed the setup code to resolve symlinks in the entries in GIT_CEILING_DIRECTORIES. Because those entries are compared textually to the symlink-resolved current directory, an entry in GIT_CEILING_DIRECTORIES that contained a symlink would have no effect. It was known that this could cause performance problems if the symlink resolution *itself* touched slow filesystems, but it was thought that such use cases would be unlikely. The intention of the earlier change was to deal with a case when the user has this: GIT_CEILING_DIRECTORIES=/home/gitster but in reality, /home/gitster is a symbolic link to somewhere else, e.g. /net/machine/home4/gitster. A textual comparison between the specified value /home/gitster and the location getcwd(3) returns would not help us, but readlink("/home/gitster") would still be fast. After this change was released, Anders Kaseorg <andersk@mit.edu> reported: > [...] my computer has been acting so slow when I’m not connected to > the network. I put various network filesystem paths in > $GIT_CEILING_DIRECTORIES, such as > /afs/athena.mit.edu/user/a/n/andersk (to avoid hitting its parents > /afs/athena.mit.edu, /afs/athena.mit.edu/user/a, and > /afs/athena.mit.edu/user/a/n which all live in different AFS > volumes). Now when I’m not connected to the network, every > invocation of Git, including the __git_ps1 in my shell prompt, waits > for AFS to timeout. To allow users to work around this problem, give them a mechanism to turn off symlink resolution in GIT_CEILING_DIRECTORIES entries. All the entries that follow an empty entry will not be checked for symbolic links and used literally in comparison. E.g. with these: GIT_CEILING_DIRECTORIES=:/foo/bar:/xyzzy or GIT_CEILING_DIRECTORIES=/foo/bar::/xyzzy we will not readlink("/xyzzy") because it comes after an empty entry. With the former (but not with the latter), "/foo/bar" comes after an empty entry, and we will not readlink it, either. Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-02-20 10:09:24 +01:00
canonicalize_ceiling_entry, &empty_entry_found);
ceil_offset = longest_ancestor_length(dir->buf, &ceiling_dirs);
string_list_clear(&ceiling_dirs, 0);
}
if (ceil_offset < 0)
ceil_offset = min_offset - 2;
if (min_offset && min_offset == dir->len &&
!is_dir_sep(dir->buf[min_offset - 1])) {
strbuf_addch(dir, '/');
min_offset++;
}
introduce GIT_WORK_TREE to specify the work tree setup_gdg is used as abbreviation for setup_git_directory_gently. The work tree can be specified using the environment variable GIT_WORK_TREE and the config option core.worktree (the environment variable has precendence over the config option). Additionally there is a command line option --work-tree which sets the environment variable. setup_gdg does the following now: GIT_DIR unspecified repository in .git directory parent directory of the .git directory is used as work tree, GIT_WORK_TREE is ignored GIT_DIR unspecified repository in cwd GIT_DIR is set to cwd see the cases with GIT_DIR specified what happens next and also see the note below GIT_DIR specified GIT_WORK_TREE/core.worktree unspecified cwd is used as work tree GIT_DIR specified GIT_WORK_TREE/core.worktree specified the specified work tree is used Note on the case where GIT_DIR is unspecified and repository is in cwd: GIT_WORK_TREE is used but is_inside_git_dir is always true. I did it this way because setup_gdg might be called multiple times (e.g. when doing alias expansion) and in successive calls setup_gdg should do the same thing every time. Meaning of is_bare/is_inside_work_tree/is_inside_git_dir: (1) is_bare_repository A repository is bare if core.bare is true or core.bare is unspecified and the name suggests it is bare (directory not named .git). The bare option disables a few protective checks which are useful with a working tree. Currently this changes if a repository is bare: updates of HEAD are allowed git gc packs the refs the reflog is disabled by default (2) is_inside_work_tree True if the cwd is inside the associated working tree (if there is one), false otherwise. (3) is_inside_git_dir True if the cwd is inside the git directory, false otherwise. Before this patch is_inside_git_dir was always true for bare repositories. When setup_gdg finds a repository git_config(git_default_config) is always called. This ensure that is_bare_repository makes use of core.bare and does not guess even though core.bare is specified. inside_work_tree and inside_git_dir are set if setup_gdg finds a repository. The is_inside_work_tree and is_inside_git_dir functions will die if they are called before a successful call to setup_gdg. Signed-off-by: Matthias Lederhofer <matled@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-06 09:10:42 +02:00
/*
* Test in the following order (relative to the dir):
* - .git (file containing "gitdir: <path>")
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
* - .git/
* - ./ (bare)
* - ../.git
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
* - ../.git/
* - ../ (bare)
* - ../../.git
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
* etc.
introduce GIT_WORK_TREE to specify the work tree setup_gdg is used as abbreviation for setup_git_directory_gently. The work tree can be specified using the environment variable GIT_WORK_TREE and the config option core.worktree (the environment variable has precendence over the config option). Additionally there is a command line option --work-tree which sets the environment variable. setup_gdg does the following now: GIT_DIR unspecified repository in .git directory parent directory of the .git directory is used as work tree, GIT_WORK_TREE is ignored GIT_DIR unspecified repository in cwd GIT_DIR is set to cwd see the cases with GIT_DIR specified what happens next and also see the note below GIT_DIR specified GIT_WORK_TREE/core.worktree unspecified cwd is used as work tree GIT_DIR specified GIT_WORK_TREE/core.worktree specified the specified work tree is used Note on the case where GIT_DIR is unspecified and repository is in cwd: GIT_WORK_TREE is used but is_inside_git_dir is always true. I did it this way because setup_gdg might be called multiple times (e.g. when doing alias expansion) and in successive calls setup_gdg should do the same thing every time. Meaning of is_bare/is_inside_work_tree/is_inside_git_dir: (1) is_bare_repository A repository is bare if core.bare is true or core.bare is unspecified and the name suggests it is bare (directory not named .git). The bare option disables a few protective checks which are useful with a working tree. Currently this changes if a repository is bare: updates of HEAD are allowed git gc packs the refs the reflog is disabled by default (2) is_inside_work_tree True if the cwd is inside the associated working tree (if there is one), false otherwise. (3) is_inside_git_dir True if the cwd is inside the git directory, false otherwise. Before this patch is_inside_git_dir was always true for bare repositories. When setup_gdg finds a repository git_config(git_default_config) is always called. This ensure that is_bare_repository makes use of core.bare and does not guess even though core.bare is specified. inside_work_tree and inside_git_dir are set if setup_gdg finds a repository. The is_inside_work_tree and is_inside_git_dir functions will die if they are called before a successful call to setup_gdg. Signed-off-by: Matthias Lederhofer <matled@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-06 09:10:42 +02:00
*/
one_filesystem = !git_env_bool("GIT_DISCOVERY_ACROSS_FILESYSTEM", 0);
if (one_filesystem)
current_device = get_device_or_die(dir->buf, NULL, 0);
Clean up work-tree handling The old version of work-tree support was an unholy mess, barely readable, and not to the point. For example, why do you have to provide a worktree, when it is not used? As in "git status". Now it works. Another riddle was: if you can have work trees inside the git dir, why are some programs complaining that they need a work tree? IOW it is allowed to call $ git --git-dir=../ --work-tree=. bla when you really want to. In this case, you are both in the git directory and in the working tree. So, programs have to actually test for the right thing, namely if they are inside a working tree, and not if they are inside a git directory. Also, GIT_DIR=../.git should behave the same as if no GIT_DIR was specified, unless there is a repository in the current working directory. It does now. The logic to determine if a repository is bare, or has a work tree (tertium non datur), is this: --work-tree=bla overrides GIT_WORK_TREE, which overrides core.bare = true, which overrides core.worktree, which overrides GIT_DIR/.. when GIT_DIR ends in /.git, which overrides the directory in which .git/ was found. In related news, a long standing bug was fixed: when in .git/bla/x.git/, which is a bare repository, git formerly assumed ../.. to be the appropriate git dir. This problem was reported by Shawn Pearce to have caused much pain, where a colleague mistakenly ran "git init" in "/" a long time ago, and bare repositories just would not work. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-08-01 02:30:14 +02:00
for (;;) {
int offset = dir->len, error_code = 0;
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
char *gitdir_path = NULL;
char *gitfile = NULL;
if (offset > min_offset)
strbuf_addch(dir, '/');
strbuf_addstr(dir, DEFAULT_GIT_DIR_ENVIRONMENT);
gitdirenv = read_gitfile_gently(dir->buf, die_on_error ?
NULL : &error_code);
if (!gitdirenv) {
if (die_on_error ||
error_code == READ_GITFILE_ERR_NOT_A_FILE) {
/* NEEDSWORK: fail if .git is not file nor dir */
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
if (is_git_directory(dir->buf)) {
gitdirenv = DEFAULT_GIT_DIR_ENVIRONMENT;
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
gitdir_path = xstrdup(dir->buf);
}
} else if (error_code != READ_GITFILE_ERR_STAT_FAILED)
return GIT_DIR_INVALID_GITFILE;
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
} else
gitfile = xstrdup(dir->buf);
/*
* Earlier, we tentatively added DEFAULT_GIT_DIR_ENVIRONMENT
* to check that directory for a repository.
* Now trim that tentative addition away, because we want to
* focus on the real directory we are in.
*/
strbuf_setlen(dir, offset);
if (gitdirenv) {
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
enum discovery_result ret;
const char *gitdir_candidate =
gitdir_path ? gitdir_path : gitdirenv;
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
if (ensure_valid_ownership(gitfile, dir->buf,
gitdir_candidate, report)) {
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
strbuf_addstr(gitdir, gitdirenv);
ret = GIT_DIR_DISCOVERED;
} else
ret = GIT_DIR_INVALID_OWNERSHIP;
/*
* Earlier, during discovery, we might have allocated
* string copies for gitdir_path or gitfile so make
* sure we don't leak by freeing them now, before
* leaving the loop and function.
*
* Note: gitdirenv will be non-NULL whenever these are
* allocated, therefore we need not take care of releasing
* them outside of this conditional block.
*/
free(gitdir_path);
free(gitfile);
return ret;
}
if (is_git_directory(dir->buf)) {
trace2_data_string("setup", NULL, "implicit-bare-repository", dir->buf);
if (get_allowed_bare_repo() == ALLOWED_BARE_REPO_EXPLICIT &&
setup: notice more types of implicit bare repositories Setting the safe.bareRepository configuration variable to explicit stops git from using a bare repository, unless the repository is explicitly specified, either by the "--git-dir=<path>" command line option, or by exporting $GIT_DIR environment variable. This may be a reasonable measure to safeguard users from accidentally straying into a bare repository in unexpected places, but often gets in the way of users who need valid accesses to the repository. Earlier, 45bb9162 (setup: allow cwd=.git w/ bareRepository=explicit, 2024-01-20) loosened the rule such that being inside the ".git" directory of a non-bare repository does not really count as accessing a "bare" repository. The reason why such a loosening is needed is because often hooks and third-party tools run from within $GIT_DIR while working with a non-bare repository. More importantly, the reason why this is safe is because a directory whose contents look like that of a "bare" repository cannot be a bare repository that came embedded within a checkout of a malicious project, as long as its directory name is ".git", because ".git" is not a name allowed for a directory in payload. There are at least two other cases where tools have to work in a bare-repository looking directory that is not an embedded bare repository, and accesses to them are still not allowed by the recent change. - A secondary worktree (whose name is $name) has its $GIT_DIR inside "worktrees/$name/" subdirectory of the $GIT_DIR of the primary worktree of the same repository. - A submodule worktree (whose name is $name) has its $GIT_DIR inside "modules/$name/" subdirectory of the $GIT_DIR of its superproject. As long as the primary worktree or the superproject in these cases are not bare, the pathname of these "looks like bare but not really" directories will have "/.git/worktrees/" and "/.git/modules/" as a substring in its leading part, and we can take advantage of the same security guarantee allow git to work from these places. Extend the earlier "in a directory called '.git' we are OK" logic used for the primary worktree to also cover the secondary worktree's and non-embedded submodule's $GIT_DIR, by moving the logic to a helper function "is_implicit_bare_repo()". We deliberately exclude secondary worktrees and submodules of a bare repository, as these are exactly what safe.bareRepository=explicit setting is designed to forbid accesses to without an explicit GIT_DIR/--git-dir=<path> Helped-by: Kyle Lippincott <spectral@google.com> Helped-by: Kyle Meyer <kyle@kyleam.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-10 00:27:09 +01:00
!is_implicit_bare_repo(dir->buf))
setup.c: create `safe.bareRepository` There is a known social engineering attack that takes advantage of the fact that a working tree can include an entire bare repository, including a config file. A user could run a Git command inside the bare repository thinking that the config file of the 'outer' repository would be used, but in reality, the bare repository's config file (which is attacker-controlled) is used, which may result in arbitrary code execution. See [1] for a fuller description and deeper discussion. A simple mitigation is to forbid bare repositories unless specified via `--git-dir` or `GIT_DIR`. In environments that don't use bare repositories, this would be minimally disruptive. Create a config variable, `safe.bareRepository`, that tells Git whether or not to die() when working with a bare repository. This config is an enum of: - "all": allow all bare repositories (this is the default) - "explicit": only allow bare repositories specified via --git-dir or GIT_DIR. If we want to protect users from such attacks by default, neither value will suffice - "all" provides no protection, but "explicit" is impractical for bare repository users. A more usable default would be to allow only non-embedded bare repositories ([2] contains one such proposal), but detecting if a repository is embedded is potentially non-trivial, so this work is not implemented in this series. [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com [2]: https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@github.com Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14 23:28:01 +02:00
return GIT_DIR_DISALLOWED_BARE;
if (!ensure_valid_ownership(NULL, NULL, dir->buf, report))
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
return GIT_DIR_INVALID_OWNERSHIP;
strbuf_addstr(gitdir, ".");
return GIT_DIR_BARE;
}
if (offset <= min_offset)
return GIT_DIR_HIT_CEILING;
while (--offset > ceil_offset && !is_dir_sep(dir->buf[offset]))
; /* continue */
if (offset <= ceil_offset)
return GIT_DIR_HIT_CEILING;
strbuf_setlen(dir, offset > min_offset ? offset : min_offset);
if (one_filesystem &&
current_device != get_device_or_die(dir->buf, NULL, offset))
return GIT_DIR_HIT_MOUNT_POINT;
introduce GIT_WORK_TREE to specify the work tree setup_gdg is used as abbreviation for setup_git_directory_gently. The work tree can be specified using the environment variable GIT_WORK_TREE and the config option core.worktree (the environment variable has precendence over the config option). Additionally there is a command line option --work-tree which sets the environment variable. setup_gdg does the following now: GIT_DIR unspecified repository in .git directory parent directory of the .git directory is used as work tree, GIT_WORK_TREE is ignored GIT_DIR unspecified repository in cwd GIT_DIR is set to cwd see the cases with GIT_DIR specified what happens next and also see the note below GIT_DIR specified GIT_WORK_TREE/core.worktree unspecified cwd is used as work tree GIT_DIR specified GIT_WORK_TREE/core.worktree specified the specified work tree is used Note on the case where GIT_DIR is unspecified and repository is in cwd: GIT_WORK_TREE is used but is_inside_git_dir is always true. I did it this way because setup_gdg might be called multiple times (e.g. when doing alias expansion) and in successive calls setup_gdg should do the same thing every time. Meaning of is_bare/is_inside_work_tree/is_inside_git_dir: (1) is_bare_repository A repository is bare if core.bare is true or core.bare is unspecified and the name suggests it is bare (directory not named .git). The bare option disables a few protective checks which are useful with a working tree. Currently this changes if a repository is bare: updates of HEAD are allowed git gc packs the refs the reflog is disabled by default (2) is_inside_work_tree True if the cwd is inside the associated working tree (if there is one), false otherwise. (3) is_inside_git_dir True if the cwd is inside the git directory, false otherwise. Before this patch is_inside_git_dir was always true for bare repositories. When setup_gdg finds a repository git_config(git_default_config) is always called. This ensure that is_bare_repository makes use of core.bare and does not guess even though core.bare is specified. inside_work_tree and inside_git_dir are set if setup_gdg finds a repository. The is_inside_work_tree and is_inside_git_dir functions will die if they are called before a successful call to setup_gdg. Signed-off-by: Matthias Lederhofer <matled@gmx.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-06-06 09:10:42 +02:00
}
}
setup: add discover_git_directory_reason() There are many reasons why discovering a Git directory may fail. In particular, 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02) added ownership checks as a security precaution. Callers attempting to set up a Git directory may want to inform the user about the reason for the failure. For that, expose the enum discovery_result from within setup.c and move it into cache.h where discover_git_directory() is defined. I initially wanted to change the return type of discover_git_directory() to be this enum, but several callers rely upon the "zero means success". The two problems with this are: 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive results are errors. 2. There are multiple successful states; positive results are successful. It is worth noting that GIT_DIR_NONE is not returned, so we remove this option from the enum. We must be careful to keep the successful reasons as positive values, so they are given explicit positive values. Instead of updating all callers immediately, add a new method, discover_git_directory_reason(), and convert discover_git_directory() to be a thin shim on top of it. One thing that is important to note is that discover_git_directory() previously returned -1 on error, so let's continue that into the future. There is only one caller (in scalar.c) that depends on that signedness instead of a non-zero check, so clean that up, too. Because there are extra checks that discover_git_directory_reason() does after setup_git_directory_gently_1(), there are other modes that can be returned for failure states. Add these modes to the enum, but be sure to explicitly add them as BUG() states in the switch of setup_git_directory_gently(). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-28 15:52:25 +02:00
enum discovery_result discover_git_directory_reason(struct strbuf *commondir,
struct strbuf *gitdir)
{
struct strbuf dir = STRBUF_INIT, err = STRBUF_INIT;
size_t gitdir_offset = gitdir->len, cwd_len;
size_t commondir_offset = commondir->len;
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
struct repository_format candidate = REPOSITORY_FORMAT_INIT;
setup: add discover_git_directory_reason() There are many reasons why discovering a Git directory may fail. In particular, 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02) added ownership checks as a security precaution. Callers attempting to set up a Git directory may want to inform the user about the reason for the failure. For that, expose the enum discovery_result from within setup.c and move it into cache.h where discover_git_directory() is defined. I initially wanted to change the return type of discover_git_directory() to be this enum, but several callers rely upon the "zero means success". The two problems with this are: 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive results are errors. 2. There are multiple successful states; positive results are successful. It is worth noting that GIT_DIR_NONE is not returned, so we remove this option from the enum. We must be careful to keep the successful reasons as positive values, so they are given explicit positive values. Instead of updating all callers immediately, add a new method, discover_git_directory_reason(), and convert discover_git_directory() to be a thin shim on top of it. One thing that is important to note is that discover_git_directory() previously returned -1 on error, so let's continue that into the future. There is only one caller (in scalar.c) that depends on that signedness instead of a non-zero check, so clean that up, too. Because there are extra checks that discover_git_directory_reason() does after setup_git_directory_gently_1(), there are other modes that can be returned for failure states. Add these modes to the enum, but be sure to explicitly add them as BUG() states in the switch of setup_git_directory_gently(). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-28 15:52:25 +02:00
enum discovery_result result;
if (strbuf_getcwd(&dir))
setup: add discover_git_directory_reason() There are many reasons why discovering a Git directory may fail. In particular, 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02) added ownership checks as a security precaution. Callers attempting to set up a Git directory may want to inform the user about the reason for the failure. For that, expose the enum discovery_result from within setup.c and move it into cache.h where discover_git_directory() is defined. I initially wanted to change the return type of discover_git_directory() to be this enum, but several callers rely upon the "zero means success". The two problems with this are: 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive results are errors. 2. There are multiple successful states; positive results are successful. It is worth noting that GIT_DIR_NONE is not returned, so we remove this option from the enum. We must be careful to keep the successful reasons as positive values, so they are given explicit positive values. Instead of updating all callers immediately, add a new method, discover_git_directory_reason(), and convert discover_git_directory() to be a thin shim on top of it. One thing that is important to note is that discover_git_directory() previously returned -1 on error, so let's continue that into the future. There is only one caller (in scalar.c) that depends on that signedness instead of a non-zero check, so clean that up, too. Because there are extra checks that discover_git_directory_reason() does after setup_git_directory_gently_1(), there are other modes that can be returned for failure states. Add these modes to the enum, but be sure to explicitly add them as BUG() states in the switch of setup_git_directory_gently(). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-28 15:52:25 +02:00
return GIT_DIR_CWD_FAILURE;
cwd_len = dir.len;
setup: add discover_git_directory_reason() There are many reasons why discovering a Git directory may fail. In particular, 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02) added ownership checks as a security precaution. Callers attempting to set up a Git directory may want to inform the user about the reason for the failure. For that, expose the enum discovery_result from within setup.c and move it into cache.h where discover_git_directory() is defined. I initially wanted to change the return type of discover_git_directory() to be this enum, but several callers rely upon the "zero means success". The two problems with this are: 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive results are errors. 2. There are multiple successful states; positive results are successful. It is worth noting that GIT_DIR_NONE is not returned, so we remove this option from the enum. We must be careful to keep the successful reasons as positive values, so they are given explicit positive values. Instead of updating all callers immediately, add a new method, discover_git_directory_reason(), and convert discover_git_directory() to be a thin shim on top of it. One thing that is important to note is that discover_git_directory() previously returned -1 on error, so let's continue that into the future. There is only one caller (in scalar.c) that depends on that signedness instead of a non-zero check, so clean that up, too. Because there are extra checks that discover_git_directory_reason() does after setup_git_directory_gently_1(), there are other modes that can be returned for failure states. Add these modes to the enum, but be sure to explicitly add them as BUG() states in the switch of setup_git_directory_gently(). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-28 15:52:25 +02:00
result = setup_git_directory_gently_1(&dir, gitdir, NULL, 0);
if (result <= 0) {
strbuf_release(&dir);
setup: add discover_git_directory_reason() There are many reasons why discovering a Git directory may fail. In particular, 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02) added ownership checks as a security precaution. Callers attempting to set up a Git directory may want to inform the user about the reason for the failure. For that, expose the enum discovery_result from within setup.c and move it into cache.h where discover_git_directory() is defined. I initially wanted to change the return type of discover_git_directory() to be this enum, but several callers rely upon the "zero means success". The two problems with this are: 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive results are errors. 2. There are multiple successful states; positive results are successful. It is worth noting that GIT_DIR_NONE is not returned, so we remove this option from the enum. We must be careful to keep the successful reasons as positive values, so they are given explicit positive values. Instead of updating all callers immediately, add a new method, discover_git_directory_reason(), and convert discover_git_directory() to be a thin shim on top of it. One thing that is important to note is that discover_git_directory() previously returned -1 on error, so let's continue that into the future. There is only one caller (in scalar.c) that depends on that signedness instead of a non-zero check, so clean that up, too. Because there are extra checks that discover_git_directory_reason() does after setup_git_directory_gently_1(), there are other modes that can be returned for failure states. Add these modes to the enum, but be sure to explicitly add them as BUG() states in the switch of setup_git_directory_gently(). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-28 15:52:25 +02:00
return result;
}
/*
* The returned gitdir is relative to dir, and if dir does not reflect
* the current working directory, we simply make the gitdir absolute.
*/
if (dir.len < cwd_len && !is_absolute_path(gitdir->buf + gitdir_offset)) {
/* Avoid a trailing "/." */
if (!strcmp(".", gitdir->buf + gitdir_offset))
strbuf_setlen(gitdir, gitdir_offset);
else
strbuf_addch(&dir, '/');
strbuf_insert(gitdir, gitdir_offset, dir.buf, dir.len);
}
get_common_dir(commondir, gitdir->buf + gitdir_offset);
strbuf_reset(&dir);
strbuf_addf(&dir, "%s/config", commondir->buf + commondir_offset);
read_repository_format(&candidate, dir.buf);
strbuf_release(&dir);
if (verify_repository_format(&candidate, &err) < 0) {
warning("ignoring git dir '%s': %s",
gitdir->buf + gitdir_offset, err.buf);
strbuf_release(&err);
strbuf_setlen(commondir, commondir_offset);
strbuf_setlen(gitdir, gitdir_offset);
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
clear_repository_format(&candidate);
setup: add discover_git_directory_reason() There are many reasons why discovering a Git directory may fail. In particular, 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02) added ownership checks as a security precaution. Callers attempting to set up a Git directory may want to inform the user about the reason for the failure. For that, expose the enum discovery_result from within setup.c and move it into cache.h where discover_git_directory() is defined. I initially wanted to change the return type of discover_git_directory() to be this enum, but several callers rely upon the "zero means success". The two problems with this are: 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive results are errors. 2. There are multiple successful states; positive results are successful. It is worth noting that GIT_DIR_NONE is not returned, so we remove this option from the enum. We must be careful to keep the successful reasons as positive values, so they are given explicit positive values. Instead of updating all callers immediately, add a new method, discover_git_directory_reason(), and convert discover_git_directory() to be a thin shim on top of it. One thing that is important to note is that discover_git_directory() previously returned -1 on error, so let's continue that into the future. There is only one caller (in scalar.c) that depends on that signedness instead of a non-zero check, so clean that up, too. Because there are extra checks that discover_git_directory_reason() does after setup_git_directory_gently_1(), there are other modes that can be returned for failure states. Add these modes to the enum, but be sure to explicitly add them as BUG() states in the switch of setup_git_directory_gently(). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-28 15:52:25 +02:00
return GIT_DIR_INVALID_FORMAT;
}
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
clear_repository_format(&candidate);
setup: add discover_git_directory_reason() There are many reasons why discovering a Git directory may fail. In particular, 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02) added ownership checks as a security precaution. Callers attempting to set up a Git directory may want to inform the user about the reason for the failure. For that, expose the enum discovery_result from within setup.c and move it into cache.h where discover_git_directory() is defined. I initially wanted to change the return type of discover_git_directory() to be this enum, but several callers rely upon the "zero means success". The two problems with this are: 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive results are errors. 2. There are multiple successful states; positive results are successful. It is worth noting that GIT_DIR_NONE is not returned, so we remove this option from the enum. We must be careful to keep the successful reasons as positive values, so they are given explicit positive values. Instead of updating all callers immediately, add a new method, discover_git_directory_reason(), and convert discover_git_directory() to be a thin shim on top of it. One thing that is important to note is that discover_git_directory() previously returned -1 on error, so let's continue that into the future. There is only one caller (in scalar.c) that depends on that signedness instead of a non-zero check, so clean that up, too. Because there are extra checks that discover_git_directory_reason() does after setup_git_directory_gently_1(), there are other modes that can be returned for failure states. Add these modes to the enum, but be sure to explicitly add them as BUG() states in the switch of setup_git_directory_gently(). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-28 15:52:25 +02:00
return result;
}
const char *setup_git_directory_gently(int *nongit_ok)
{
static struct strbuf cwd = STRBUF_INIT;
struct strbuf dir = STRBUF_INIT, gitdir = STRBUF_INIT, report = STRBUF_INIT;
Simplify handling of setup_git_directory_gently() failure cases. setup_git_directory_gently() expects two types of failures to discover a git directory (e.g. .git/): - GIT_DIR_HIT_CEILING: could not find a git directory in any parent directories of the cwd. - GIT_DIR_HIT_MOUNT_POINT: could not find a git directory in any parent directories up to the mount point of the cwd. Both cases are handled in a similar way, but there are misleading and unimportant differences. In both cases, setup_git_directory_gently() should: - Die if we are not in a git repository. Otherwise: - Set nongit_ok = 1, indicating that we are not in a git repository but this is ok. - Call strbuf_release() on any non-static struct strbufs that we allocated. Before this change are two misleading additional behaviors: - GIT_DIR_HIT_CEILING: setup_nongit() changes to the cwd for no apparent reason. We never had the chance to change directories up to this point so chdir(current cwd) is pointless. - GIT_DIR_HIT_MOUNT_POINT: strbuf_release() frees the buffer of a static struct strbuf (cwd). This is unnecessary because the struct is static so its buffer is always reachable. This is also misleading because nowhere else in the function is this buffer released. This change eliminates these two misleading additional behaviors and deletes setup_nogit() because the code is clearer without it. The result is that we can see clearly that GIT_DIR_HIT_CEILING and GIT_DIR_HIT_MOUNT_POINT lead to the same behavior (ignoring the different help messages). During review, this change was amended to additionally include: - Neither GIT_DIR_HIT_CEILING nor GIT_DIR_HIT_MOUNT_POINT may return early from setup_git_directory_gently() before the GIT_PREFIX environment variable is reset. Change both cases to break instead of return. See GIT_PREFIX below for more details. - GIT_DIR_NONE: setup_git_directory_gently_1() never returns this value, but if it ever did, setup_git_directory_gently() would incorrectly record that it had found a repository. Explicitly BUG on this case because it is underspecified. - GIT_PREFIX: this environment variable must always match the value of startup_info->prefix and the prefix returned from setup_git_directory_gently(). Make how we handle this slightly more repetitive but also more clear. - setup_git_env() and repo_set_hash_algo(): Add comments showing that only GIT_DIR_EXPLICIT, GIT_DIR_DISCOVERED, and GIT_DIR_BARE will cause setup_git_directory_gently() to call these setup functions. This was obvious (but partly incorrect) before this change when GIT_DIR_HIT_MOUNT_POINT returned early from setup_git_directory_gently(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-28 00:36:29 +01:00
const char *prefix = NULL;
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
/*
* We may have read an incomplete configuration before
* setting-up the git directory. If so, clear the cache so
* that the next queries to the configuration reload complete
* configuration (including the per-repo config file that we
* ignored previously).
*/
git_config_clear();
/*
* Let's assume that we are in a git repository.
* If it turns out later that we are somewhere else, the value will be
* updated accordingly.
*/
if (nongit_ok)
*nongit_ok = 0;
if (strbuf_getcwd(&cwd))
die_errno(_("Unable to read current working directory"));
strbuf_addbuf(&dir, &cwd);
switch (setup_git_directory_gently_1(&dir, &gitdir, &report, 1)) {
case GIT_DIR_EXPLICIT:
prefix = setup_explicit_git_dir(gitdir.buf, &cwd, &repo_fmt, nongit_ok);
break;
case GIT_DIR_DISCOVERED:
if (dir.len < cwd.len && chdir(dir.buf))
die(_("cannot change to '%s'"), dir.buf);
prefix = setup_discovered_git_dir(gitdir.buf, &cwd, dir.len,
&repo_fmt, nongit_ok);
break;
case GIT_DIR_BARE:
if (dir.len < cwd.len && chdir(dir.buf))
die(_("cannot change to '%s'"), dir.buf);
prefix = setup_bare_git_dir(&cwd, dir.len, &repo_fmt, nongit_ok);
break;
case GIT_DIR_HIT_CEILING:
Simplify handling of setup_git_directory_gently() failure cases. setup_git_directory_gently() expects two types of failures to discover a git directory (e.g. .git/): - GIT_DIR_HIT_CEILING: could not find a git directory in any parent directories of the cwd. - GIT_DIR_HIT_MOUNT_POINT: could not find a git directory in any parent directories up to the mount point of the cwd. Both cases are handled in a similar way, but there are misleading and unimportant differences. In both cases, setup_git_directory_gently() should: - Die if we are not in a git repository. Otherwise: - Set nongit_ok = 1, indicating that we are not in a git repository but this is ok. - Call strbuf_release() on any non-static struct strbufs that we allocated. Before this change are two misleading additional behaviors: - GIT_DIR_HIT_CEILING: setup_nongit() changes to the cwd for no apparent reason. We never had the chance to change directories up to this point so chdir(current cwd) is pointless. - GIT_DIR_HIT_MOUNT_POINT: strbuf_release() frees the buffer of a static struct strbuf (cwd). This is unnecessary because the struct is static so its buffer is always reachable. This is also misleading because nowhere else in the function is this buffer released. This change eliminates these two misleading additional behaviors and deletes setup_nogit() because the code is clearer without it. The result is that we can see clearly that GIT_DIR_HIT_CEILING and GIT_DIR_HIT_MOUNT_POINT lead to the same behavior (ignoring the different help messages). During review, this change was amended to additionally include: - Neither GIT_DIR_HIT_CEILING nor GIT_DIR_HIT_MOUNT_POINT may return early from setup_git_directory_gently() before the GIT_PREFIX environment variable is reset. Change both cases to break instead of return. See GIT_PREFIX below for more details. - GIT_DIR_NONE: setup_git_directory_gently_1() never returns this value, but if it ever did, setup_git_directory_gently() would incorrectly record that it had found a repository. Explicitly BUG on this case because it is underspecified. - GIT_PREFIX: this environment variable must always match the value of startup_info->prefix and the prefix returned from setup_git_directory_gently(). Make how we handle this slightly more repetitive but also more clear. - setup_git_env() and repo_set_hash_algo(): Add comments showing that only GIT_DIR_EXPLICIT, GIT_DIR_DISCOVERED, and GIT_DIR_BARE will cause setup_git_directory_gently() to call these setup functions. This was obvious (but partly incorrect) before this change when GIT_DIR_HIT_MOUNT_POINT returned early from setup_git_directory_gently(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-28 00:36:29 +01:00
if (!nongit_ok)
die(_("not a git repository (or any of the parent directories): %s"),
DEFAULT_GIT_DIR_ENVIRONMENT);
*nongit_ok = 1;
break;
case GIT_DIR_HIT_MOUNT_POINT:
Simplify handling of setup_git_directory_gently() failure cases. setup_git_directory_gently() expects two types of failures to discover a git directory (e.g. .git/): - GIT_DIR_HIT_CEILING: could not find a git directory in any parent directories of the cwd. - GIT_DIR_HIT_MOUNT_POINT: could not find a git directory in any parent directories up to the mount point of the cwd. Both cases are handled in a similar way, but there are misleading and unimportant differences. In both cases, setup_git_directory_gently() should: - Die if we are not in a git repository. Otherwise: - Set nongit_ok = 1, indicating that we are not in a git repository but this is ok. - Call strbuf_release() on any non-static struct strbufs that we allocated. Before this change are two misleading additional behaviors: - GIT_DIR_HIT_CEILING: setup_nongit() changes to the cwd for no apparent reason. We never had the chance to change directories up to this point so chdir(current cwd) is pointless. - GIT_DIR_HIT_MOUNT_POINT: strbuf_release() frees the buffer of a static struct strbuf (cwd). This is unnecessary because the struct is static so its buffer is always reachable. This is also misleading because nowhere else in the function is this buffer released. This change eliminates these two misleading additional behaviors and deletes setup_nogit() because the code is clearer without it. The result is that we can see clearly that GIT_DIR_HIT_CEILING and GIT_DIR_HIT_MOUNT_POINT lead to the same behavior (ignoring the different help messages). During review, this change was amended to additionally include: - Neither GIT_DIR_HIT_CEILING nor GIT_DIR_HIT_MOUNT_POINT may return early from setup_git_directory_gently() before the GIT_PREFIX environment variable is reset. Change both cases to break instead of return. See GIT_PREFIX below for more details. - GIT_DIR_NONE: setup_git_directory_gently_1() never returns this value, but if it ever did, setup_git_directory_gently() would incorrectly record that it had found a repository. Explicitly BUG on this case because it is underspecified. - GIT_PREFIX: this environment variable must always match the value of startup_info->prefix and the prefix returned from setup_git_directory_gently(). Make how we handle this slightly more repetitive but also more clear. - setup_git_env() and repo_set_hash_algo(): Add comments showing that only GIT_DIR_EXPLICIT, GIT_DIR_DISCOVERED, and GIT_DIR_BARE will cause setup_git_directory_gently() to call these setup functions. This was obvious (but partly incorrect) before this change when GIT_DIR_HIT_MOUNT_POINT returned early from setup_git_directory_gently(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-28 00:36:29 +01:00
if (!nongit_ok)
die(_("not a git repository (or any parent up to mount point %s)\n"
"Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set)."),
dir.buf);
*nongit_ok = 1;
break;
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
case GIT_DIR_INVALID_OWNERSHIP:
if (!nongit_ok) {
struct strbuf quoted = STRBUF_INIT;
strbuf_complete(&report, '\n');
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
sq_quote_buf_pretty(&quoted, dir.buf);
setup: tighten ownership checks post CVE-2022-24765 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02), adds a function to check for ownership of repositories using a directory that is representative of it, and ways to add exempt a specific repository from said check if needed, but that check didn't account for owership of the gitdir, or (when used) the gitfile that points to that gitdir. An attacker could create a git repository in a directory that they can write into but that is owned by the victim to work around the fix that was introduced with CVE-2022-24765 to potentially run code as the victim. An example that could result in privilege escalation to root in *NIX would be to set a repository in a shared tmp directory by doing (for example): $ git -C /tmp init To avoid that, extend the ensure_valid_ownership function to be able to check for all three paths. This will have the side effect of tripling the number of stat() calls when a repository is detected, but the effect is expected to be likely minimal, as it is done only once during the directory walk in which Git looks for a repository. Additionally make sure to resolve the gitfile (if one was used) to find the relevant gitdir for checking. While at it change the message printed on failure so it is clear we are referring to the repository by its worktree (or gitdir if it is bare) and not to a specific directory. Helped-by: Junio C Hamano <junio@pobox.com> Helped-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com>
2022-05-10 21:35:29 +02:00
die(_("detected dubious ownership in repository at '%s'\n"
"%s"
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
"To add an exception for this directory, call:\n"
"\n"
"\tgit config --global --add safe.directory %s"),
dir.buf, report.buf, quoted.buf);
setup_git_directory(): add an owner check for the top-level directory It poses a security risk to search for a git directory outside of the directories owned by the current user. For example, it is common e.g. in computer pools of educational institutes to have a "scratch" space: a mounted disk with plenty of space that is regularly swiped where any authenticated user can create a directory to do their work. Merely navigating to such a space with a Git-enabled `PS1` when there is a maliciously-crafted `/scratch/.git/` can lead to a compromised account. The same holds true in multi-user setups running Windows, as `C:\` is writable to every authenticated user by default. To plug this vulnerability, we stop Git from accepting top-level directories owned by someone other than the current user. We avoid looking at the ownership of each and every directories between the current and the top-level one (if there are any between) to avoid introducing a performance bottleneck. This new default behavior is obviously incompatible with the concept of shared repositories, where we expect the top-level directory to be owned by only one of its legitimate users. To re-enable that use case, we add support for adding exceptions from the new default behavior via the config setting `safe.directory`. The `safe.directory` config setting is only respected in the system and global configs, not from repository configs or via the command-line, and can have multiple values to allow for multiple shared repositories. We are particularly careful to provide a helpful message to any user trying to use a shared repository. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2022-03-02 12:23:04 +01:00
}
*nongit_ok = 1;
break;
setup.c: create `safe.bareRepository` There is a known social engineering attack that takes advantage of the fact that a working tree can include an entire bare repository, including a config file. A user could run a Git command inside the bare repository thinking that the config file of the 'outer' repository would be used, but in reality, the bare repository's config file (which is attacker-controlled) is used, which may result in arbitrary code execution. See [1] for a fuller description and deeper discussion. A simple mitigation is to forbid bare repositories unless specified via `--git-dir` or `GIT_DIR`. In environments that don't use bare repositories, this would be minimally disruptive. Create a config variable, `safe.bareRepository`, that tells Git whether or not to die() when working with a bare repository. This config is an enum of: - "all": allow all bare repositories (this is the default) - "explicit": only allow bare repositories specified via --git-dir or GIT_DIR. If we want to protect users from such attacks by default, neither value will suffice - "all" provides no protection, but "explicit" is impractical for bare repository users. A more usable default would be to allow only non-embedded bare repositories ([2] contains one such proposal), but detecting if a repository is embedded is potentially non-trivial, so this work is not implemented in this series. [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com [2]: https://lore.kernel.org/git/5b969c5e-e802-c447-ad25-6acc0b784582@github.com Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-14 23:28:01 +02:00
case GIT_DIR_DISALLOWED_BARE:
if (!nongit_ok) {
die(_("cannot use bare repository '%s' (safe.bareRepository is '%s')"),
dir.buf,
allowed_bare_repo_to_string(get_allowed_bare_repo()));
}
*nongit_ok = 1;
break;
setup: add discover_git_directory_reason() There are many reasons why discovering a Git directory may fail. In particular, 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02) added ownership checks as a security precaution. Callers attempting to set up a Git directory may want to inform the user about the reason for the failure. For that, expose the enum discovery_result from within setup.c and move it into cache.h where discover_git_directory() is defined. I initially wanted to change the return type of discover_git_directory() to be this enum, but several callers rely upon the "zero means success". The two problems with this are: 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive results are errors. 2. There are multiple successful states; positive results are successful. It is worth noting that GIT_DIR_NONE is not returned, so we remove this option from the enum. We must be careful to keep the successful reasons as positive values, so they are given explicit positive values. Instead of updating all callers immediately, add a new method, discover_git_directory_reason(), and convert discover_git_directory() to be a thin shim on top of it. One thing that is important to note is that discover_git_directory() previously returned -1 on error, so let's continue that into the future. There is only one caller (in scalar.c) that depends on that signedness instead of a non-zero check, so clean that up, too. Because there are extra checks that discover_git_directory_reason() does after setup_git_directory_gently_1(), there are other modes that can be returned for failure states. Add these modes to the enum, but be sure to explicitly add them as BUG() states in the switch of setup_git_directory_gently(). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-28 15:52:25 +02:00
case GIT_DIR_CWD_FAILURE:
case GIT_DIR_INVALID_FORMAT:
Simplify handling of setup_git_directory_gently() failure cases. setup_git_directory_gently() expects two types of failures to discover a git directory (e.g. .git/): - GIT_DIR_HIT_CEILING: could not find a git directory in any parent directories of the cwd. - GIT_DIR_HIT_MOUNT_POINT: could not find a git directory in any parent directories up to the mount point of the cwd. Both cases are handled in a similar way, but there are misleading and unimportant differences. In both cases, setup_git_directory_gently() should: - Die if we are not in a git repository. Otherwise: - Set nongit_ok = 1, indicating that we are not in a git repository but this is ok. - Call strbuf_release() on any non-static struct strbufs that we allocated. Before this change are two misleading additional behaviors: - GIT_DIR_HIT_CEILING: setup_nongit() changes to the cwd for no apparent reason. We never had the chance to change directories up to this point so chdir(current cwd) is pointless. - GIT_DIR_HIT_MOUNT_POINT: strbuf_release() frees the buffer of a static struct strbuf (cwd). This is unnecessary because the struct is static so its buffer is always reachable. This is also misleading because nowhere else in the function is this buffer released. This change eliminates these two misleading additional behaviors and deletes setup_nogit() because the code is clearer without it. The result is that we can see clearly that GIT_DIR_HIT_CEILING and GIT_DIR_HIT_MOUNT_POINT lead to the same behavior (ignoring the different help messages). During review, this change was amended to additionally include: - Neither GIT_DIR_HIT_CEILING nor GIT_DIR_HIT_MOUNT_POINT may return early from setup_git_directory_gently() before the GIT_PREFIX environment variable is reset. Change both cases to break instead of return. See GIT_PREFIX below for more details. - GIT_DIR_NONE: setup_git_directory_gently_1() never returns this value, but if it ever did, setup_git_directory_gently() would incorrectly record that it had found a repository. Explicitly BUG on this case because it is underspecified. - GIT_PREFIX: this environment variable must always match the value of startup_info->prefix and the prefix returned from setup_git_directory_gently(). Make how we handle this slightly more repetitive but also more clear. - setup_git_env() and repo_set_hash_algo(): Add comments showing that only GIT_DIR_EXPLICIT, GIT_DIR_DISCOVERED, and GIT_DIR_BARE will cause setup_git_directory_gently() to call these setup functions. This was obvious (but partly incorrect) before this change when GIT_DIR_HIT_MOUNT_POINT returned early from setup_git_directory_gently(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-28 00:36:29 +01:00
/*
* As a safeguard against setup_git_directory_gently_1 returning
setup: add discover_git_directory_reason() There are many reasons why discovering a Git directory may fail. In particular, 8959555cee7 (setup_git_directory(): add an owner check for the top-level directory, 2022-03-02) added ownership checks as a security precaution. Callers attempting to set up a Git directory may want to inform the user about the reason for the failure. For that, expose the enum discovery_result from within setup.c and move it into cache.h where discover_git_directory() is defined. I initially wanted to change the return type of discover_git_directory() to be this enum, but several callers rely upon the "zero means success". The two problems with this are: 1. The zero value of the enum is actually GIT_DIR_NONE, so nonpositive results are errors. 2. There are multiple successful states; positive results are successful. It is worth noting that GIT_DIR_NONE is not returned, so we remove this option from the enum. We must be careful to keep the successful reasons as positive values, so they are given explicit positive values. Instead of updating all callers immediately, add a new method, discover_git_directory_reason(), and convert discover_git_directory() to be a thin shim on top of it. One thing that is important to note is that discover_git_directory() previously returned -1 on error, so let's continue that into the future. There is only one caller (in scalar.c) that depends on that signedness instead of a non-zero check, so clean that up, too. Because there are extra checks that discover_git_directory_reason() does after setup_git_directory_gently_1(), there are other modes that can be returned for failure states. Add these modes to the enum, but be sure to explicitly add them as BUG() states in the switch of setup_git_directory_gently(). Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-08-28 15:52:25 +02:00
* these values, fallthrough to BUG. Otherwise it is possible to
Simplify handling of setup_git_directory_gently() failure cases. setup_git_directory_gently() expects two types of failures to discover a git directory (e.g. .git/): - GIT_DIR_HIT_CEILING: could not find a git directory in any parent directories of the cwd. - GIT_DIR_HIT_MOUNT_POINT: could not find a git directory in any parent directories up to the mount point of the cwd. Both cases are handled in a similar way, but there are misleading and unimportant differences. In both cases, setup_git_directory_gently() should: - Die if we are not in a git repository. Otherwise: - Set nongit_ok = 1, indicating that we are not in a git repository but this is ok. - Call strbuf_release() on any non-static struct strbufs that we allocated. Before this change are two misleading additional behaviors: - GIT_DIR_HIT_CEILING: setup_nongit() changes to the cwd for no apparent reason. We never had the chance to change directories up to this point so chdir(current cwd) is pointless. - GIT_DIR_HIT_MOUNT_POINT: strbuf_release() frees the buffer of a static struct strbuf (cwd). This is unnecessary because the struct is static so its buffer is always reachable. This is also misleading because nowhere else in the function is this buffer released. This change eliminates these two misleading additional behaviors and deletes setup_nogit() because the code is clearer without it. The result is that we can see clearly that GIT_DIR_HIT_CEILING and GIT_DIR_HIT_MOUNT_POINT lead to the same behavior (ignoring the different help messages). During review, this change was amended to additionally include: - Neither GIT_DIR_HIT_CEILING nor GIT_DIR_HIT_MOUNT_POINT may return early from setup_git_directory_gently() before the GIT_PREFIX environment variable is reset. Change both cases to break instead of return. See GIT_PREFIX below for more details. - GIT_DIR_NONE: setup_git_directory_gently_1() never returns this value, but if it ever did, setup_git_directory_gently() would incorrectly record that it had found a repository. Explicitly BUG on this case because it is underspecified. - GIT_PREFIX: this environment variable must always match the value of startup_info->prefix and the prefix returned from setup_git_directory_gently(). Make how we handle this slightly more repetitive but also more clear. - setup_git_env() and repo_set_hash_algo(): Add comments showing that only GIT_DIR_EXPLICIT, GIT_DIR_DISCOVERED, and GIT_DIR_BARE will cause setup_git_directory_gently() to call these setup functions. This was obvious (but partly incorrect) before this change when GIT_DIR_HIT_MOUNT_POINT returned early from setup_git_directory_gently(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-28 00:36:29 +01:00
* set startup_info->have_repository to 1 when we did nothing to
* find a repository.
*/
default:
BUG("unhandled setup_git_directory_gently_1() result");
}
Simplify handling of setup_git_directory_gently() failure cases. setup_git_directory_gently() expects two types of failures to discover a git directory (e.g. .git/): - GIT_DIR_HIT_CEILING: could not find a git directory in any parent directories of the cwd. - GIT_DIR_HIT_MOUNT_POINT: could not find a git directory in any parent directories up to the mount point of the cwd. Both cases are handled in a similar way, but there are misleading and unimportant differences. In both cases, setup_git_directory_gently() should: - Die if we are not in a git repository. Otherwise: - Set nongit_ok = 1, indicating that we are not in a git repository but this is ok. - Call strbuf_release() on any non-static struct strbufs that we allocated. Before this change are two misleading additional behaviors: - GIT_DIR_HIT_CEILING: setup_nongit() changes to the cwd for no apparent reason. We never had the chance to change directories up to this point so chdir(current cwd) is pointless. - GIT_DIR_HIT_MOUNT_POINT: strbuf_release() frees the buffer of a static struct strbuf (cwd). This is unnecessary because the struct is static so its buffer is always reachable. This is also misleading because nowhere else in the function is this buffer released. This change eliminates these two misleading additional behaviors and deletes setup_nogit() because the code is clearer without it. The result is that we can see clearly that GIT_DIR_HIT_CEILING and GIT_DIR_HIT_MOUNT_POINT lead to the same behavior (ignoring the different help messages). During review, this change was amended to additionally include: - Neither GIT_DIR_HIT_CEILING nor GIT_DIR_HIT_MOUNT_POINT may return early from setup_git_directory_gently() before the GIT_PREFIX environment variable is reset. Change both cases to break instead of return. See GIT_PREFIX below for more details. - GIT_DIR_NONE: setup_git_directory_gently_1() never returns this value, but if it ever did, setup_git_directory_gently() would incorrectly record that it had found a repository. Explicitly BUG on this case because it is underspecified. - GIT_PREFIX: this environment variable must always match the value of startup_info->prefix and the prefix returned from setup_git_directory_gently(). Make how we handle this slightly more repetitive but also more clear. - setup_git_env() and repo_set_hash_algo(): Add comments showing that only GIT_DIR_EXPLICIT, GIT_DIR_DISCOVERED, and GIT_DIR_BARE will cause setup_git_directory_gently() to call these setup functions. This was obvious (but partly incorrect) before this change when GIT_DIR_HIT_MOUNT_POINT returned early from setup_git_directory_gently(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-28 00:36:29 +01:00
/*
* At this point, nongit_ok is stable. If it is non-NULL and points
* to a non-zero value, then this means that we haven't found a
* repository and that the caller expects startup_info to reflect
* this.
*
* Regardless of the state of nongit_ok, startup_info->prefix and
* the GIT_PREFIX environment variable must always match. For details
* see Documentation/config/alias.txt.
*/
if (nongit_ok && *nongit_ok)
Simplify handling of setup_git_directory_gently() failure cases. setup_git_directory_gently() expects two types of failures to discover a git directory (e.g. .git/): - GIT_DIR_HIT_CEILING: could not find a git directory in any parent directories of the cwd. - GIT_DIR_HIT_MOUNT_POINT: could not find a git directory in any parent directories up to the mount point of the cwd. Both cases are handled in a similar way, but there are misleading and unimportant differences. In both cases, setup_git_directory_gently() should: - Die if we are not in a git repository. Otherwise: - Set nongit_ok = 1, indicating that we are not in a git repository but this is ok. - Call strbuf_release() on any non-static struct strbufs that we allocated. Before this change are two misleading additional behaviors: - GIT_DIR_HIT_CEILING: setup_nongit() changes to the cwd for no apparent reason. We never had the chance to change directories up to this point so chdir(current cwd) is pointless. - GIT_DIR_HIT_MOUNT_POINT: strbuf_release() frees the buffer of a static struct strbuf (cwd). This is unnecessary because the struct is static so its buffer is always reachable. This is also misleading because nowhere else in the function is this buffer released. This change eliminates these two misleading additional behaviors and deletes setup_nogit() because the code is clearer without it. The result is that we can see clearly that GIT_DIR_HIT_CEILING and GIT_DIR_HIT_MOUNT_POINT lead to the same behavior (ignoring the different help messages). During review, this change was amended to additionally include: - Neither GIT_DIR_HIT_CEILING nor GIT_DIR_HIT_MOUNT_POINT may return early from setup_git_directory_gently() before the GIT_PREFIX environment variable is reset. Change both cases to break instead of return. See GIT_PREFIX below for more details. - GIT_DIR_NONE: setup_git_directory_gently_1() never returns this value, but if it ever did, setup_git_directory_gently() would incorrectly record that it had found a repository. Explicitly BUG on this case because it is underspecified. - GIT_PREFIX: this environment variable must always match the value of startup_info->prefix and the prefix returned from setup_git_directory_gently(). Make how we handle this slightly more repetitive but also more clear. - setup_git_env() and repo_set_hash_algo(): Add comments showing that only GIT_DIR_EXPLICIT, GIT_DIR_DISCOVERED, and GIT_DIR_BARE will cause setup_git_directory_gently() to call these setup functions. This was obvious (but partly incorrect) before this change when GIT_DIR_HIT_MOUNT_POINT returned early from setup_git_directory_gently(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-28 00:36:29 +01:00
startup_info->have_repository = 0;
else
Simplify handling of setup_git_directory_gently() failure cases. setup_git_directory_gently() expects two types of failures to discover a git directory (e.g. .git/): - GIT_DIR_HIT_CEILING: could not find a git directory in any parent directories of the cwd. - GIT_DIR_HIT_MOUNT_POINT: could not find a git directory in any parent directories up to the mount point of the cwd. Both cases are handled in a similar way, but there are misleading and unimportant differences. In both cases, setup_git_directory_gently() should: - Die if we are not in a git repository. Otherwise: - Set nongit_ok = 1, indicating that we are not in a git repository but this is ok. - Call strbuf_release() on any non-static struct strbufs that we allocated. Before this change are two misleading additional behaviors: - GIT_DIR_HIT_CEILING: setup_nongit() changes to the cwd for no apparent reason. We never had the chance to change directories up to this point so chdir(current cwd) is pointless. - GIT_DIR_HIT_MOUNT_POINT: strbuf_release() frees the buffer of a static struct strbuf (cwd). This is unnecessary because the struct is static so its buffer is always reachable. This is also misleading because nowhere else in the function is this buffer released. This change eliminates these two misleading additional behaviors and deletes setup_nogit() because the code is clearer without it. The result is that we can see clearly that GIT_DIR_HIT_CEILING and GIT_DIR_HIT_MOUNT_POINT lead to the same behavior (ignoring the different help messages). During review, this change was amended to additionally include: - Neither GIT_DIR_HIT_CEILING nor GIT_DIR_HIT_MOUNT_POINT may return early from setup_git_directory_gently() before the GIT_PREFIX environment variable is reset. Change both cases to break instead of return. See GIT_PREFIX below for more details. - GIT_DIR_NONE: setup_git_directory_gently_1() never returns this value, but if it ever did, setup_git_directory_gently() would incorrectly record that it had found a repository. Explicitly BUG on this case because it is underspecified. - GIT_PREFIX: this environment variable must always match the value of startup_info->prefix and the prefix returned from setup_git_directory_gently(). Make how we handle this slightly more repetitive but also more clear. - setup_git_env() and repo_set_hash_algo(): Add comments showing that only GIT_DIR_EXPLICIT, GIT_DIR_DISCOVERED, and GIT_DIR_BARE will cause setup_git_directory_gently() to call these setup functions. This was obvious (but partly incorrect) before this change when GIT_DIR_HIT_MOUNT_POINT returned early from setup_git_directory_gently(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-28 00:36:29 +01:00
startup_info->have_repository = 1;
setup: make startup_info available everywhere Commit a60645f (setup: remember whether repository was found, 2010-08-05) introduced the startup_info structure, which records some parts of the setup_git_directory() process (notably, whether we actually found a repository or not). One of the uses of this data is for functions to behave appropriately based on whether we are in a repo. But the startup_info struct is just a pointer to storage provided by the main program, and the only program that sets it up is the git.c wrapper. Thus builtins have access to startup_info, but externally linked programs do not. Worse, library code which is accessible from both has to be careful about accessing startup_info. This can be used to trigger a die("BUG") via get_sha1(): $ git fast-import <<-\EOF tag foo from HEAD:./whatever EOF fatal: BUG: startup_info struct is not initialized. Obviously that's fairly nonsensical input to feed to fast-import, but we should never hit a die("BUG"). And there may be other ways to trigger it if other non-builtins resolve sha1s. So let's point the storage for startup_info to a static variable in setup.c, making it available to all users of the library code. We _could_ turn startup_info into a regular extern struct, but doing so would mean tweaking all of the existing use sites. So let's leave the pointer indirection in place. We can, however, drop any checks for NULL, as they will always be false (and likewise, we can drop the test covering this case, which was a rather artificial situation using one of the test-* programs). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-03-05 23:10:27 +01:00
/*
* Not all paths through the setup code will call 'set_git_dir()' (which
* directly sets up the environment) so in order to guarantee that the
* environment is in a consistent state after setup, explicitly setup
* the environment if we have a repository.
*
* NEEDSWORK: currently we allow bogus GIT_DIR values to be set in some
* code paths so we also need to explicitly setup the environment if
* the user has set GIT_DIR. It may be beneficial to disallow bogus
* GIT_DIR values at some point in the future.
*/
Simplify handling of setup_git_directory_gently() failure cases. setup_git_directory_gently() expects two types of failures to discover a git directory (e.g. .git/): - GIT_DIR_HIT_CEILING: could not find a git directory in any parent directories of the cwd. - GIT_DIR_HIT_MOUNT_POINT: could not find a git directory in any parent directories up to the mount point of the cwd. Both cases are handled in a similar way, but there are misleading and unimportant differences. In both cases, setup_git_directory_gently() should: - Die if we are not in a git repository. Otherwise: - Set nongit_ok = 1, indicating that we are not in a git repository but this is ok. - Call strbuf_release() on any non-static struct strbufs that we allocated. Before this change are two misleading additional behaviors: - GIT_DIR_HIT_CEILING: setup_nongit() changes to the cwd for no apparent reason. We never had the chance to change directories up to this point so chdir(current cwd) is pointless. - GIT_DIR_HIT_MOUNT_POINT: strbuf_release() frees the buffer of a static struct strbuf (cwd). This is unnecessary because the struct is static so its buffer is always reachable. This is also misleading because nowhere else in the function is this buffer released. This change eliminates these two misleading additional behaviors and deletes setup_nogit() because the code is clearer without it. The result is that we can see clearly that GIT_DIR_HIT_CEILING and GIT_DIR_HIT_MOUNT_POINT lead to the same behavior (ignoring the different help messages). During review, this change was amended to additionally include: - Neither GIT_DIR_HIT_CEILING nor GIT_DIR_HIT_MOUNT_POINT may return early from setup_git_directory_gently() before the GIT_PREFIX environment variable is reset. Change both cases to break instead of return. See GIT_PREFIX below for more details. - GIT_DIR_NONE: setup_git_directory_gently_1() never returns this value, but if it ever did, setup_git_directory_gently() would incorrectly record that it had found a repository. Explicitly BUG on this case because it is underspecified. - GIT_PREFIX: this environment variable must always match the value of startup_info->prefix and the prefix returned from setup_git_directory_gently(). Make how we handle this slightly more repetitive but also more clear. - setup_git_env() and repo_set_hash_algo(): Add comments showing that only GIT_DIR_EXPLICIT, GIT_DIR_DISCOVERED, and GIT_DIR_BARE will cause setup_git_directory_gently() to call these setup functions. This was obvious (but partly incorrect) before this change when GIT_DIR_HIT_MOUNT_POINT returned early from setup_git_directory_gently(). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-12-28 00:36:29 +01:00
if (/* GIT_DIR_EXPLICIT, GIT_DIR_DISCOVERED, GIT_DIR_BARE */
startup_info->have_repository ||
/* GIT_DIR_EXPLICIT */
getenv(GIT_DIR_ENVIRONMENT)) {
if (!the_repository->gitdir) {
const char *gitdir = getenv(GIT_DIR_ENVIRONMENT);
if (!gitdir)
gitdir = DEFAULT_GIT_DIR_ENVIRONMENT;
setup_git_env(gitdir);
}
if (startup_info->have_repository) {
repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
repo_set_compat_hash_algo(the_repository,
repo_fmt.compat_hash_algo);
repo_set_ref_storage_format(the_repository,
repo_fmt.ref_storage_format);
the_repository->repository_format_worktree_config =
repo_fmt.worktree_config;
/* take ownership of repo_fmt.partial_clone */
the_repository->repository_format_partial_clone =
repo_fmt.partial_clone;
repo_fmt.partial_clone = NULL;
}
}
/*
* Since precompose_string_if_needed() needs to look at
* the core.precomposeunicode configuration, this
* has to happen after the above block that finds
* out where the repository is, i.e. a preparation
* for calling git_config_get_bool().
*/
if (prefix) {
prefix = precompose_string_if_needed(prefix);
startup_info->prefix = prefix;
setenv(GIT_PREFIX_ENVIRONMENT, prefix, 1);
} else {
startup_info->prefix = NULL;
setenv(GIT_PREFIX_ENVIRONMENT, "", 1);
}
setup_original_cwd();
strbuf_release(&dir);
strbuf_release(&gitdir);
strbuf_release(&report);
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
clear_repository_format(&repo_fmt);
return prefix;
}
int git_config_perm(const char *var, const char *value)
{
int i;
char *endptr;
if (!value)
return PERM_GROUP;
if (!strcmp(value, "umask"))
return PERM_UMASK;
if (!strcmp(value, "group"))
return PERM_GROUP;
if (!strcmp(value, "all") ||
!strcmp(value, "world") ||
!strcmp(value, "everybody"))
return PERM_EVERYBODY;
/* Parse octal numbers */
i = strtol(value, &endptr, 8);
/* If not an octal number, maybe true/false? */
if (*endptr != 0)
return git_config_bool(var, value) ? PERM_GROUP : PERM_UMASK;
/*
* Treat values 0, 1 and 2 as compatibility cases, otherwise it is
* a chmod value to restrict to.
*/
switch (i) {
case PERM_UMASK: /* 0 */
return PERM_UMASK;
case OLD_PERM_GROUP: /* 1 */
return PERM_GROUP;
case OLD_PERM_EVERYBODY: /* 2 */
return PERM_EVERYBODY;
}
/* A filemode value was given: 0xxx */
if ((i & 0600) != 0600)
die(_("problem with core.sharedRepository filemode value "
"(0%.3o).\nThe owner of files must always have "
"read and write permissions."), i);
/*
* Mask filemode value. Others can not get write permission.
* x flags for directories are handled separately.
*/
return -(i & 0666);
}
void check_repository_format(struct repository_format *fmt)
{
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
if (!fmt)
fmt = &repo_fmt;
check_repository_format_gently(get_git_dir(), fmt, NULL);
startup_info->have_repository = 1;
repo_set_hash_algo(the_repository, fmt->hash_algo);
repo_set_compat_hash_algo(the_repository, fmt->compat_hash_algo);
repo_set_ref_storage_format(the_repository,
fmt->ref_storage_format);
the_repository->repository_format_worktree_config =
fmt->worktree_config;
the_repository->repository_format_partial_clone =
xstrdup_or_null(fmt->partial_clone);
setup: fix memory leaks with `struct repository_format` After we set up a `struct repository_format`, it owns various pieces of allocated memory. We then either use those members, because we decide we want to use the "candidate" repository format, or we discard the candidate / scratch space. In the first case, we transfer ownership of the memory to a few global variables. In the latter case, we just silently drop the struct and end up leaking memory. Introduce an initialization macro `REPOSITORY_FORMAT_INIT` and a function `clear_repository_format()`, to be used on each side of `read_repository_format()`. To have a clear and simple memory ownership, let all users of `struct repository_format` duplicate the strings that they take from it, rather than stealing the pointers. Call `clear_...()` at the start of `read_...()` instead of just zeroing the struct, since we sometimes enter the function multiple times. Thus, it is important to initialize the struct before calling `read_...()`, so document that. It's also important because we might not even call `read_...()` before we call `clear_...()`, see, e.g., builtin/init-db.c. Teach `read_...()` to clear the struct on error, so that it is reset to a safe state, and document this. (In `setup_git_directory_gently()`, we look at `repo_fmt.hash_algo` even if `repo_fmt.version` is -1, which we weren't actually supposed to do per the API. After this commit, that's ok.) We inherit the existing code's combining "error" and "no version found". Both are signalled through `version == -1` and now both cause us to clear any partial configuration we have picked up. For "extensions.*", that's fine, since they require a positive version number. For "core.bare" and "core.worktree", we're already verifying that we have a non-negative version number before using them. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-02-28 21:36:28 +01:00
clear_repository_format(&repo_fmt);
}
/*
* Returns the "prefix", a path to the current working directory
* relative to the work tree root, or NULL, if the current working
* directory is not a strict subdirectory of the work tree root. The
* prefix always ends with a '/' character.
*/
const char *setup_git_directory(void)
{
return setup_git_directory_gently(NULL);
}
const char *resolve_gitdir_gently(const char *suspect, int *return_error_code)
{
if (is_git_directory(suspect))
return suspect;
return read_gitfile_gently(suspect, return_error_code);
}
/* if any standard file descriptor is missing open it to /dev/null */
void sanitize_stdfds(void)
{
int fd = xopen("/dev/null", O_RDWR);
while (fd < 2)
fd = xdup(fd);
if (fd > 2)
close(fd);
}
int daemonize(void)
{
#ifdef NO_POSIX_GOODIES
errno = ENOSYS;
return -1;
#else
switch (fork()) {
case 0:
break;
case -1:
die_errno(_("fork failed"));
default:
exit(0);
}
if (setsid() == -1)
die_errno(_("setsid failed"));
close(0);
close(1);
close(2);
sanitize_stdfds();
return 0;
#endif
}
#ifdef NO_TRUSTABLE_FILEMODE
#define TEST_FILEMODE 0
#else
#define TEST_FILEMODE 1
#endif
#define GIT_DEFAULT_HASH_ENVIRONMENT "GIT_DEFAULT_HASH"
static void copy_templates_1(struct strbuf *path, struct strbuf *template_path,
DIR *dir)
{
size_t path_baselen = path->len;
size_t template_baselen = template_path->len;
struct dirent *de;
/* Note: if ".git/hooks" file exists in the repository being
* re-initialized, /etc/core-git/templates/hooks/update would
* cause "git init" to fail here. I think this is sane but
* it means that the set of templates we ship by default, along
* with the way the namespace under .git/ is organized, should
* be really carefully chosen.
*/
safe_create_dir(path->buf, 1);
while ((de = readdir(dir)) != NULL) {
struct stat st_git, st_template;
int exists = 0;
strbuf_setlen(path, path_baselen);
strbuf_setlen(template_path, template_baselen);
if (de->d_name[0] == '.')
continue;
strbuf_addstr(path, de->d_name);
strbuf_addstr(template_path, de->d_name);
if (lstat(path->buf, &st_git)) {
if (errno != ENOENT)
die_errno(_("cannot stat '%s'"), path->buf);
}
else
exists = 1;
if (lstat(template_path->buf, &st_template))
die_errno(_("cannot stat template '%s'"), template_path->buf);
if (S_ISDIR(st_template.st_mode)) {
DIR *subdir = opendir(template_path->buf);
if (!subdir)
die_errno(_("cannot opendir '%s'"), template_path->buf);
strbuf_addch(path, '/');
strbuf_addch(template_path, '/');
copy_templates_1(path, template_path, subdir);
closedir(subdir);
}
else if (exists)
continue;
else if (S_ISLNK(st_template.st_mode)) {
struct strbuf lnk = STRBUF_INIT;
if (strbuf_readlink(&lnk, template_path->buf,
st_template.st_size) < 0)
die_errno(_("cannot readlink '%s'"), template_path->buf);
if (symlink(lnk.buf, path->buf))
die_errno(_("cannot symlink '%s' '%s'"),
lnk.buf, path->buf);
strbuf_release(&lnk);
}
else if (S_ISREG(st_template.st_mode)) {
if (copy_file(path->buf, template_path->buf, st_template.st_mode))
die_errno(_("cannot copy '%s' to '%s'"),
template_path->buf, path->buf);
}
else
error(_("ignoring template %s"), template_path->buf);
}
}
static void copy_templates(const char *template_dir, const char *init_template_dir)
{
struct strbuf path = STRBUF_INIT;
struct strbuf template_path = STRBUF_INIT;
size_t template_len;
struct repository_format template_format = REPOSITORY_FORMAT_INIT;
struct strbuf err = STRBUF_INIT;
DIR *dir;
char *to_free = NULL;
if (!template_dir)
template_dir = getenv(TEMPLATE_DIR_ENVIRONMENT);
if (!template_dir)
template_dir = init_template_dir;
if (!template_dir)
template_dir = to_free = system_path(DEFAULT_GIT_TEMPLATE_DIR);
if (!template_dir[0]) {
free(to_free);
return;
}
strbuf_addstr(&template_path, template_dir);
strbuf_complete(&template_path, '/');
template_len = template_path.len;
dir = opendir(template_path.buf);
if (!dir) {
warning(_("templates not found in %s"), template_dir);
goto free_return;
}
/* Make sure that template is from the correct vintage */
strbuf_addstr(&template_path, "config");
read_repository_format(&template_format, template_path.buf);
strbuf_setlen(&template_path, template_len);
/*
* No mention of version at all is OK, but anything else should be
* verified.
*/
if (template_format.version >= 0 &&
verify_repository_format(&template_format, &err) < 0) {
warning(_("not copying templates from '%s': %s"),
template_dir, err.buf);
strbuf_release(&err);
goto close_free_return;
}
strbuf_addstr(&path, get_git_common_dir());
strbuf_complete(&path, '/');
copy_templates_1(&path, &template_path, dir);
close_free_return:
closedir(dir);
free_return:
free(to_free);
strbuf_release(&path);
strbuf_release(&template_path);
clear_repository_format(&template_format);
}
/*
* If the git_dir is not directly inside the working tree, then git will not
* find it by default, and we need to set the worktree explicitly.
*/
static int needs_work_tree_config(const char *git_dir, const char *work_tree)
{
if (!strcmp(work_tree, "/") && !strcmp(git_dir, "/.git"))
return 0;
if (skip_prefix(git_dir, work_tree, &git_dir) &&
!strcmp(git_dir, "/.git"))
return 0;
return 1;
}
setup: introduce "extensions.refStorage" extension Introduce a new "extensions.refStorage" extension that allows us to specify the ref storage format used by a repository. For now, the only supported format is the "files" format, but this list will likely soon be extended to also support the upcoming "reftable" format. There have been discussions on the Git mailing list in the past around how exactly this extension should look like. One alternative [1] that was discussed was whether it would make sense to model the extension in such a way that backends are arbitrarily stackable. This would allow for a combined value of e.g. "loose,packed-refs" or "loose,reftable", which indicates that new refs would be written via "loose" files backend and compressed into "packed-refs" or "reftable" backends, respectively. It is arguable though whether this flexibility and the complexity that it brings with it is really required for now. It is not foreseeable that there will be a proliferation of backends in the near-term future, and the current set of existing formats and formats which are on the horizon can easily be configured with the much simpler proposal where we have a single value, only. Furthermore, if we ever see that we indeed want to gain the ability to arbitrarily stack the ref formats, then we can adapt the current extension rather easily. Given that Git clients will refuse any unknown value for the "extensions.refStorage" extension they would also know to ignore a stacked "loose,packed-refs" in the future. So let's stick with the easy proposal for the time being and wire up the extension. [1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-29 08:26:47 +01:00
void initialize_repository_version(int hash_algo,
unsigned int ref_storage_format,
int reinit)
{
char repo_version_string[10];
int repo_version = GIT_REPO_VERSION;
builtin/clone: allow remote helpers to detect repo In 18c9cb7524 (builtin/clone: create the refdb with the correct object format, 2023-12-12), we have changed git-clone(1) so that it delays creation of the refdb until after it has learned about the remote's object format. This change was required for the reftable backend, which encodes the object format into the tables. So if we pre-initialized the refdb with the default object format, but the remote uses a different object format than that, then the resulting tables would have encoded the wrong object format. This change unfortunately breaks remote helpers which try to access the repository that is about to be created. Because the refdb has not yet been initialized at the point where we spawn the remote helper, we also don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by the remote helper which try to access the repository would fail because it cannot be discovered. This is essentially a chicken-and-egg problem: we cannot initialize the refdb because we don't know about the object format. But we cannot learn about the object format because the remote helper may be unable to access the partially-initialized repository. Ideally, we would address this issue via capabilities. But the remote helper protocol is not structured in a way that guarantees that the capability announcement happens before the remote helper tries to access the repository. Instead, fix this issue by partially initializing the refdb up to the point where it becomes discoverable by Git commands. Reported-by: Mike Hommey <mh@glandium.org> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27 15:27:44 +01:00
/*
* Note that we initialize the repository version to 1 when the ref
* storage format is unknown. This is on purpose so that we can add the
* correct object format to the config during git-clone(1). The format
* version will get adjusted by git-clone(1) once it has learned about
* the remote repository's format.
*/
setup: introduce "extensions.refStorage" extension Introduce a new "extensions.refStorage" extension that allows us to specify the ref storage format used by a repository. For now, the only supported format is the "files" format, but this list will likely soon be extended to also support the upcoming "reftable" format. There have been discussions on the Git mailing list in the past around how exactly this extension should look like. One alternative [1] that was discussed was whether it would make sense to model the extension in such a way that backends are arbitrarily stackable. This would allow for a combined value of e.g. "loose,packed-refs" or "loose,reftable", which indicates that new refs would be written via "loose" files backend and compressed into "packed-refs" or "reftable" backends, respectively. It is arguable though whether this flexibility and the complexity that it brings with it is really required for now. It is not foreseeable that there will be a proliferation of backends in the near-term future, and the current set of existing formats and formats which are on the horizon can easily be configured with the much simpler proposal where we have a single value, only. Furthermore, if we ever see that we indeed want to gain the ability to arbitrarily stack the ref formats, then we can adapt the current extension rather easily. Given that Git clients will refuse any unknown value for the "extensions.refStorage" extension they would also know to ignore a stacked "loose,packed-refs" in the future. So let's stick with the easy proposal for the time being and wire up the extension. [1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-29 08:26:47 +01:00
if (hash_algo != GIT_HASH_SHA1 ||
ref_storage_format != REF_STORAGE_FORMAT_FILES)
repo_version = GIT_REPO_VERSION_READ;
/* This forces creation of new config file */
xsnprintf(repo_version_string, sizeof(repo_version_string),
"%d", repo_version);
git_config_set("core.repositoryformatversion", repo_version_string);
builtin/clone: allow remote helpers to detect repo In 18c9cb7524 (builtin/clone: create the refdb with the correct object format, 2023-12-12), we have changed git-clone(1) so that it delays creation of the refdb until after it has learned about the remote's object format. This change was required for the reftable backend, which encodes the object format into the tables. So if we pre-initialized the refdb with the default object format, but the remote uses a different object format than that, then the resulting tables would have encoded the wrong object format. This change unfortunately breaks remote helpers which try to access the repository that is about to be created. Because the refdb has not yet been initialized at the point where we spawn the remote helper, we also don't yet have "HEAD" or "refs/". Consequently, any Git commands ran by the remote helper which try to access the repository would fail because it cannot be discovered. This is essentially a chicken-and-egg problem: we cannot initialize the refdb because we don't know about the object format. But we cannot learn about the object format because the remote helper may be unable to access the partially-initialized repository. Ideally, we would address this issue via capabilities. But the remote helper protocol is not structured in a way that guarantees that the capability announcement happens before the remote helper tries to access the repository. Instead, fix this issue by partially initializing the refdb up to the point where it becomes discoverable by Git commands. Reported-by: Mike Hommey <mh@glandium.org> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-27 15:27:44 +01:00
if (hash_algo != GIT_HASH_SHA1 && hash_algo != GIT_HASH_UNKNOWN)
git_config_set("extensions.objectformat",
hash_algos[hash_algo].name);
else if (reinit)
git_config_set_gently("extensions.objectformat", NULL);
setup: introduce "extensions.refStorage" extension Introduce a new "extensions.refStorage" extension that allows us to specify the ref storage format used by a repository. For now, the only supported format is the "files" format, but this list will likely soon be extended to also support the upcoming "reftable" format. There have been discussions on the Git mailing list in the past around how exactly this extension should look like. One alternative [1] that was discussed was whether it would make sense to model the extension in such a way that backends are arbitrarily stackable. This would allow for a combined value of e.g. "loose,packed-refs" or "loose,reftable", which indicates that new refs would be written via "loose" files backend and compressed into "packed-refs" or "reftable" backends, respectively. It is arguable though whether this flexibility and the complexity that it brings with it is really required for now. It is not foreseeable that there will be a proliferation of backends in the near-term future, and the current set of existing formats and formats which are on the horizon can easily be configured with the much simpler proposal where we have a single value, only. Furthermore, if we ever see that we indeed want to gain the ability to arbitrarily stack the ref formats, then we can adapt the current extension rather easily. Given that Git clients will refuse any unknown value for the "extensions.refStorage" extension they would also know to ignore a stacked "loose,packed-refs" in the future. So let's stick with the easy proposal for the time being and wire up the extension. [1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-29 08:26:47 +01:00
if (ref_storage_format != REF_STORAGE_FORMAT_FILES)
git_config_set("extensions.refstorage",
ref_storage_format_to_name(ref_storage_format));
}
static int is_reinit(void)
{
struct strbuf buf = STRBUF_INIT;
char junk[2];
int ret;
git_path_buf(&buf, "HEAD");
ret = !access(buf.buf, R_OK) || readlink(buf.buf, junk, sizeof(junk) - 1) != -1;
strbuf_release(&buf);
return ret;
}
void create_reference_database(unsigned int ref_storage_format,
const char *initial_branch, int quiet)
{
struct strbuf err = STRBUF_INIT;
int reinit = is_reinit();
repo_set_ref_storage_format(the_repository, ref_storage_format);
if (refs_init_db(get_main_ref_store(the_repository), 0, &err))
die("failed to set up refs db: %s", err.buf);
/*
* Point the HEAD symref to the initial branch with if HEAD does
* not yet exist.
*/
if (!reinit) {
char *ref;
if (!initial_branch)
initial_branch = git_default_branch_name(quiet);
ref = xstrfmt("refs/heads/%s", initial_branch);
if (check_refname_format(ref, 0) < 0)
die(_("invalid initial branch name: '%s'"),
initial_branch);
if (create_symref("HEAD", ref, NULL) < 0)
exit(1);
free(ref);
}
if (reinit && initial_branch)
warning(_("re-init: ignored --initial-branch=%s"),
initial_branch);
strbuf_release(&err);
}
static int create_default_files(const char *template_path,
const char *original_git_dir,
const struct repository_format *fmt,
int init_shared_repository)
{
struct stat st1;
struct strbuf buf = STRBUF_INIT;
char *path;
int reinit;
int filemode;
const char *init_template_dir = NULL;
const char *work_tree = get_git_work_tree();
/*
* First copy the templates -- we might have the default
* config file there, in which case we would want to read
* from it after installing.
*
* Before reading that config, we also need to clear out any cached
* values (since we've just potentially changed what's available on
* disk).
*/
git_config_get_pathname("init.templatedir", &init_template_dir);
copy_templates(template_path, init_template_dir);
free((char *)init_template_dir);
git_config_clear();
reset_shared_repository();
git_config(git_default_config, NULL);
reinit = is_reinit();
/*
* We must make sure command-line options continue to override any
* values we might have just re-read from the config.
*/
if (init_shared_repository != -1)
set_shared_repository(init_shared_repository);
setup: remove unnecessary variable The TODO comment suggested to heed core.bare from template config file if no command line override given. And the prev_bare_repository variable seems to have been placed for this sole purpose as it is not used anywhere else. However, it was clarified by Junio [1] that such values (including core.bare) are ignored intentionally and does not make sense to propagate them from template config to repository config. Also, the directories for the worktree and repository are already created, and therefore the bare/non-bare decision has already been made, by the point we reach the codepath where the TODO comment is placed. Therefore, prev_bare_repository does not have a usecase with/without supporting core.bare from template. And the removal of prev_bare_repository is safe as proved by the later part of the comment: "Unfortunately, the line above is equivalent to is_bare_repository_cfg = !work_tree; which ignores the config entirely even if no `--[no-]bare` command line option was present. To see why, note that before this function, there was this call: prev_bare_repository = is_bare_repository() expanding the right hand side: = is_bare_repository_cfg && !get_git_work_tree() = is_bare_repository_cfg && !work_tree note that the last simplification above is valid because nothing calls repo_init() or set_git_work_tree() between any of the relevant calls in the code, and thus the !get_git_work_tree() calls will return the same result each time. So, what we are interested in computing is the right hand side of the line of code just above this comment: prev_bare_repository || !work_tree = is_bare_repository_cfg && !work_tree || !work_tree = !work_tree because "A && !B || !B == !B" for all boolean values of A & B." Therefore, remove the TODO comment and remove prev_bare_repository variable. Also, update relevant testcases and remove one redundant testcase. [1]: https://lore.kernel.org/git/xmqqjzonpy9l.fsf@gitster.g/ Helped-by: Elijah Newren <newren@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 16:18:11 +01:00
is_bare_repository_cfg = !work_tree;
/*
* We would have created the above under user's umask -- under
* shared-repository settings, we would need to fix them up.
*/
if (get_shared_repository()) {
adjust_shared_perm(get_git_dir());
}
setup: introduce "extensions.refStorage" extension Introduce a new "extensions.refStorage" extension that allows us to specify the ref storage format used by a repository. For now, the only supported format is the "files" format, but this list will likely soon be extended to also support the upcoming "reftable" format. There have been discussions on the Git mailing list in the past around how exactly this extension should look like. One alternative [1] that was discussed was whether it would make sense to model the extension in such a way that backends are arbitrarily stackable. This would allow for a combined value of e.g. "loose,packed-refs" or "loose,reftable", which indicates that new refs would be written via "loose" files backend and compressed into "packed-refs" or "reftable" backends, respectively. It is arguable though whether this flexibility and the complexity that it brings with it is really required for now. It is not foreseeable that there will be a proliferation of backends in the near-term future, and the current set of existing formats and formats which are on the horizon can easily be configured with the much simpler proposal where we have a single value, only. Furthermore, if we ever see that we indeed want to gain the ability to arbitrarily stack the ref formats, then we can adapt the current extension rather easily. Given that Git clients will refuse any unknown value for the "extensions.refStorage" extension they would also know to ignore a stacked "loose,packed-refs" in the future. So let's stick with the easy proposal for the time being and wire up the extension. [1]: <pull.1408.git.1667846164.gitgitgadget@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-29 08:26:47 +01:00
initialize_repository_version(fmt->hash_algo, fmt->ref_storage_format, 0);
/* Check filemode trustability */
path = git_path_buf(&buf, "config");
filemode = TEST_FILEMODE;
if (TEST_FILEMODE && !lstat(path, &st1)) {
struct stat st2;
filemode = (!chmod(path, st1.st_mode ^ S_IXUSR) &&
!lstat(path, &st2) &&
st1.st_mode != st2.st_mode &&
!chmod(path, st1.st_mode));
if (filemode && !reinit && (st1.st_mode & S_IXUSR))
filemode = 0;
}
git_config_set("core.filemode", filemode ? "true" : "false");
if (is_bare_repository())
git_config_set("core.bare", "true");
else {
git_config_set("core.bare", "false");
/* allow template config file to override the default */
if (log_all_ref_updates == LOG_REFS_UNSET)
git_config_set("core.logallrefupdates", "true");
if (needs_work_tree_config(original_git_dir, work_tree))
git_config_set("core.worktree", work_tree);
}
if (!reinit) {
/* Check if symlink is supported in the work tree */
path = git_path_buf(&buf, "tXXXXXX");
if (!close(xmkstemp(path)) &&
!unlink(path) &&
!symlink("testing", path) &&
!lstat(path, &st1) &&
S_ISLNK(st1.st_mode))
unlink(path); /* good */
else
git_config_set("core.symlinks", "false");
/* Check if the filesystem is case-insensitive */
path = git_path_buf(&buf, "CoNfIg");
if (!access(path, F_OK))
git_config_set("core.ignorecase", "true");
probe_utf8_pathname_composition();
}
strbuf_release(&buf);
return reinit;
}
static void create_object_directory(void)
{
struct strbuf path = STRBUF_INIT;
size_t baselen;
strbuf_addstr(&path, get_object_directory());
baselen = path.len;
safe_create_dir(path.buf, 1);
strbuf_setlen(&path, baselen);
strbuf_addstr(&path, "/pack");
safe_create_dir(path.buf, 1);
strbuf_setlen(&path, baselen);
strbuf_addstr(&path, "/info");
safe_create_dir(path.buf, 1);
strbuf_release(&path);
}
static void separate_git_dir(const char *git_dir, const char *git_link)
{
struct stat st;
if (!stat(git_link, &st)) {
const char *src;
if (S_ISREG(st.st_mode))
src = read_gitfile(git_link);
else if (S_ISDIR(st.st_mode))
src = git_link;
else
die(_("unable to handle file type %d"), (int)st.st_mode);
if (rename(src, git_dir))
die_errno(_("unable to move %s to %s"), src, git_dir);
repair_worktrees(NULL, NULL);
}
write_file(git_link, "gitdir: %s", git_dir);
}
static void validate_hash_algorithm(struct repository_format *repo_fmt, int hash)
{
const char *env = getenv(GIT_DEFAULT_HASH_ENVIRONMENT);
/*
* If we already have an initialized repo, don't allow the user to
* specify a different algorithm, as that could cause corruption.
* Otherwise, if the user has specified one on the command line, use it.
*/
if (repo_fmt->version >= 0 && hash != GIT_HASH_UNKNOWN && hash != repo_fmt->hash_algo)
die(_("attempt to reinitialize repository with different hash"));
else if (hash != GIT_HASH_UNKNOWN)
repo_fmt->hash_algo = hash;
else if (env) {
int env_algo = hash_algo_by_name(env);
if (env_algo == GIT_HASH_UNKNOWN)
die(_("unknown hash algorithm '%s'"), env);
repo_fmt->hash_algo = env_algo;
}
}
static void validate_ref_storage_format(struct repository_format *repo_fmt,
unsigned int format)
{
const char *name = getenv("GIT_DEFAULT_REF_FORMAT");
if (repo_fmt->version >= 0 &&
format != REF_STORAGE_FORMAT_UNKNOWN &&
format != repo_fmt->ref_storage_format) {
die(_("attempt to reinitialize repository with different reference storage format"));
} else if (format != REF_STORAGE_FORMAT_UNKNOWN) {
repo_fmt->ref_storage_format = format;
} else if (name) {
format = ref_storage_format_by_name(name);
if (format == REF_STORAGE_FORMAT_UNKNOWN)
die(_("unknown ref storage format '%s'"), name);
repo_fmt->ref_storage_format = format;
}
}
int init_db(const char *git_dir, const char *real_git_dir,
const char *template_dir, int hash,
unsigned int ref_storage_format,
const char *initial_branch,
int init_shared_repository, unsigned int flags)
{
int reinit;
int exist_ok = flags & INIT_DB_EXIST_OK;
char *original_git_dir = real_pathdup(git_dir, 1);
struct repository_format repo_fmt = REPOSITORY_FORMAT_INIT;
if (real_git_dir) {
struct stat st;
if (!exist_ok && !stat(git_dir, &st))
die(_("%s already exists"), git_dir);
if (!exist_ok && !stat(real_git_dir, &st))
die(_("%s already exists"), real_git_dir);
set_git_dir(real_git_dir, 1);
git_dir = get_git_dir();
separate_git_dir(git_dir, original_git_dir);
}
else {
set_git_dir(git_dir, 1);
git_dir = get_git_dir();
}
startup_info->have_repository = 1;
/* Ensure `core.hidedotfiles` is processed */
git_config(platform_core_config, NULL);
safe_create_dir(git_dir, 0);
/* Check to see if the repository version is right.
* Note that a newly created repository does not have
* config file, so this will not fail. What we are catching
* is an attempt to reinitialize new repository with an old tool.
*/
check_repository_format(&repo_fmt);
validate_hash_algorithm(&repo_fmt, hash);
validate_ref_storage_format(&repo_fmt, ref_storage_format);
reinit = create_default_files(template_dir, original_git_dir,
setup: remove unnecessary variable The TODO comment suggested to heed core.bare from template config file if no command line override given. And the prev_bare_repository variable seems to have been placed for this sole purpose as it is not used anywhere else. However, it was clarified by Junio [1] that such values (including core.bare) are ignored intentionally and does not make sense to propagate them from template config to repository config. Also, the directories for the worktree and repository are already created, and therefore the bare/non-bare decision has already been made, by the point we reach the codepath where the TODO comment is placed. Therefore, prev_bare_repository does not have a usecase with/without supporting core.bare from template. And the removal of prev_bare_repository is safe as proved by the later part of the comment: "Unfortunately, the line above is equivalent to is_bare_repository_cfg = !work_tree; which ignores the config entirely even if no `--[no-]bare` command line option was present. To see why, note that before this function, there was this call: prev_bare_repository = is_bare_repository() expanding the right hand side: = is_bare_repository_cfg && !get_git_work_tree() = is_bare_repository_cfg && !work_tree note that the last simplification above is valid because nothing calls repo_init() or set_git_work_tree() between any of the relevant calls in the code, and thus the !get_git_work_tree() calls will return the same result each time. So, what we are interested in computing is the right hand side of the line of code just above this comment: prev_bare_repository || !work_tree = is_bare_repository_cfg && !work_tree || !work_tree = !work_tree because "A && !B || !B == !B" for all boolean values of A & B." Therefore, remove the TODO comment and remove prev_bare_repository variable. Also, update relevant testcases and remove one redundant testcase. [1]: https://lore.kernel.org/git/xmqqjzonpy9l.fsf@gitster.g/ Helped-by: Elijah Newren <newren@gmail.com> Helped-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Ghanshyam Thakkar <shyamthakkar001@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-03-04 16:18:11 +01:00
&repo_fmt, init_shared_repository);
setup: set repository's formats on init The proper hash algorithm and ref storage format that will be used for a newly initialized repository will be figured out in `init_db()` via `validate_hash_algorithm()` and `validate_ref_storage_format()`. Until now though, we never set up the hash algorithm or ref storage format of `the_repository` accordingly. There are only two callsites of `init_db()`, one in git-init(1) and one in git-clone(1). The former function doesn't care for the formats to be set up properly because it never access the repository after calling the function in the first place. For git-clone(1) it's a different story though, as we call `init_db()` before listing remote refs. While we do indeed have the wrong hash function in `the_repository` when `init_db()` sets up a non-default object format for the repository, it never mattered because we adjust the hash after learning about the remote's hash function via the listed refs. So the current state is correct for the hash algo, but it's not for the ref storage format because git-clone(1) wouldn't know to set it up properly. But instead of adjusting only the `ref_storage_format`, set both the hash algo and the ref storage format so that `the_repository` is in the correct state when `init_db()` exits. This is fine as we will adjust the hash later on anyway and makes it easier to reason about the end state of `the_repository`. Signed-off-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-12-29 08:26:43 +01:00
/*
* Now that we have set up both the hash algorithm and the ref storage
* format we can update the repository's settings accordingly.
*/
repo_set_hash_algo(the_repository, repo_fmt.hash_algo);
repo_set_ref_storage_format(the_repository, repo_fmt.ref_storage_format);
if (!(flags & INIT_DB_SKIP_REFDB))
create_reference_database(repo_fmt.ref_storage_format,
initial_branch, flags & INIT_DB_QUIET);
create_object_directory();
if (get_shared_repository()) {
char buf[10];
/* We do not spell "group" and such, so that
* the configuration can be read by older version
* of git. Note, we use octal numbers for new share modes,
* and compatibility values for PERM_GROUP and
* PERM_EVERYBODY.
*/
if (get_shared_repository() < 0)
/* force to the mode value */
xsnprintf(buf, sizeof(buf), "0%o", -get_shared_repository());
else if (get_shared_repository() == PERM_GROUP)
xsnprintf(buf, sizeof(buf), "%d", OLD_PERM_GROUP);
else if (get_shared_repository() == PERM_EVERYBODY)
xsnprintf(buf, sizeof(buf), "%d", OLD_PERM_EVERYBODY);
else
BUG("invalid value for shared_repository");
git_config_set("core.sharedrepository", buf);
git_config_set("receive.denyNonFastforwards", "true");
}
if (!(flags & INIT_DB_QUIET)) {
int len = strlen(git_dir);
if (reinit)
printf(get_shared_repository()
? _("Reinitialized existing shared Git repository in %s%s\n")
: _("Reinitialized existing Git repository in %s%s\n"),
git_dir, len && git_dir[len-1] != '/' ? "/" : "");
else
printf(get_shared_repository()
? _("Initialized empty shared Git repository in %s%s\n")
: _("Initialized empty Git repository in %s%s\n"),
git_dir, len && git_dir[len-1] != '/' ? "/" : "");
}
clear_repository_format(&repo_fmt);
free(original_git_dir);
return 0;
}