1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-23 11:26:09 +02:00
git/path.h
Patrick Steinhardt 57db2a094d refs: introduce reftable backend
Due to scalability issues, Shawn Pearce has originally proposed a new
"reftable" format more than six years ago [1]. Initially, this new
format was implemented in JGit with promising results. Around two years
ago, we have then added the "reftable" library to the Git codebase via
a4bbd13be3 (Merge branch 'hn/reftable', 2021-12-15). With this we have
landed all the low-level code to read and write reftables. Notably
missing though was the integration of this low-level code into the Git
code base in the form of a new ref backend that ties all of this
together.

This gap is now finally closed by introducing a new "reftable" backend
into the Git codebase. This new backend promises to bring some notable
improvements to Git repositories:

  - It becomes possible to do truly atomic writes where either all refs
    are committed to disk or none are. This was not possible with the
    "files" backend because ref updates were split across multiple loose
    files.

  - The disk space required to store many refs is reduced, both compared
    to loose refs and packed-refs. This is enabled both by the reftable
    format being a binary format, which is more compact, and by prefix
    compression.

  - We can ignore filesystem-specific behaviour as ref names are not
    encoded via paths anymore. This means there is no need to handle
    case sensitivity on Windows systems or Unicode precomposition on
    macOS.

  - There is no need to rewrite the complete refdb anymore every time a
    ref is being deleted like it was the case for packed-refs. This
    means that ref deletions are now constant time instead of scaling
    linearly with the number of refs.

  - We can ignore file/directory conflicts so that it becomes possible
    to store both "refs/heads/foo" and "refs/heads/foo/bar".

  - Due to this property we can retain reflogs for deleted refs. We have
    previously been deleting reflogs together with their refs to avoid
    file/directory conflicts, which is not necessary anymore.

  - We can properly enumerate all refs. With the "files" backend it is
    not easily possible to distinguish between refs and non-refs because
    they may live side by side in the gitdir.

Not all of these improvements are realized with the current "reftable"
backend implementation. At this point, the new backend is supposed to be
a drop-in replacement for the "files" backend that is used by basically
all Git repositories nowadays. It strives for 1:1 compatibility, which
means that a user can expect the same behaviour regardless of whether
they use the "reftable" backend or the "files" backend for most of the
part.

Most notably, this means we artificially limit the capabilities of the
"reftable" backend to match the limits of the "files" backend. It is not
possible to create refs that would end up with file/directory conflicts,
we do not retain reflogs, we perform stricter-than-necessary checks.
This is done intentionally due to two main reasons:

  - It makes it significantly easier to land the "reftable" backend as
    tests behave the same. It would be tough to argue for each and every
    single test that doesn't pass with the "reftable" backend.

  - It ensures compatibility between repositories that use the "files"
    backend and repositories that use the "reftable" backend. Like this,
    hosters can migrate their repositories to use the "reftable" backend
    without causing issues for clients that use the "files" backend in
    their clones.

It is expected that these artificial limitations may eventually go away
in the long term.

Performance-wise things very much depend on the actual workload. The
following benchmarks compare the "files" and "reftable" backends in the
current version:

  - Creating N refs in separate transactions shows that the "files"
    backend is ~50% faster. This is not surprising given that creating a
    ref only requires us to create a single loose ref. The "reftable"
    backend will also perform auto compaction on updates. In real-world
    workloads we would likely also want to perform pack loose refs,
    which would likely change the picture.

        Benchmark 1: update-ref: create refs sequentially (refformat = files, refcount = 1)
          Time (mean ± σ):       2.1 ms ±   0.3 ms    [User: 0.6 ms, System: 1.7 ms]
          Range (min … max):     1.8 ms …   4.3 ms    133 runs

        Benchmark 2: update-ref: create refs sequentially (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.7 ms ±   0.1 ms    [User: 0.6 ms, System: 2.2 ms]
          Range (min … max):     2.4 ms …   2.9 ms    132 runs

        Benchmark 3: update-ref: create refs sequentially (refformat = files, refcount = 1000)
          Time (mean ± σ):      1.975 s ±  0.006 s    [User: 0.437 s, System: 1.535 s]
          Range (min … max):    1.969 s …  1.980 s    3 runs

        Benchmark 4: update-ref: create refs sequentially (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      2.611 s ±  0.013 s    [User: 0.782 s, System: 1.825 s]
          Range (min … max):    2.597 s …  2.622 s    3 runs

        Benchmark 5: update-ref: create refs sequentially (refformat = files, refcount = 100000)
          Time (mean ± σ):     198.442 s ±  0.241 s    [User: 43.051 s, System: 155.250 s]
          Range (min … max):   198.189 s … 198.670 s    3 runs

        Benchmark 6: update-ref: create refs sequentially (refformat = reftable, refcount = 100000)
          Time (mean ± σ):     294.509 s ±  4.269 s    [User: 104.046 s, System: 190.326 s]
          Range (min … max):   290.223 s … 298.761 s    3 runs

  - Creating N refs in a single transaction shows that the "files"
    backend is significantly slower once we start to write many refs.
    The "reftable" backend only needs to update two files, whereas the
    "files" backend needs to write one file per ref.

        Benchmark 1: update-ref: create many refs (refformat = files, refcount = 1)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.4 ms, System: 1.4 ms]
          Range (min … max):     1.8 ms …   2.6 ms    151 runs

        Benchmark 2: update-ref: create many refs (refformat = reftable, refcount = 1)
          Time (mean ± σ):       2.5 ms ±   0.1 ms    [User: 0.7 ms, System: 1.7 ms]
          Range (min … max):     2.4 ms …   3.4 ms    148 runs

        Benchmark 3: update-ref: create many refs (refformat = files, refcount = 1000)
          Time (mean ± σ):     152.5 ms ±   5.2 ms    [User: 19.1 ms, System: 133.1 ms]
          Range (min … max):   148.5 ms … 167.8 ms    15 runs

        Benchmark 4: update-ref: create many refs (refformat = reftable, refcount = 1000)
          Time (mean ± σ):      58.0 ms ±   2.5 ms    [User: 28.4 ms, System: 29.4 ms]
          Range (min … max):    56.3 ms …  72.9 ms    40 runs

        Benchmark 5: update-ref: create many refs (refformat = files, refcount = 1000000)
          Time (mean ± σ):     152.752 s ±  0.710 s    [User: 20.315 s, System: 131.310 s]
          Range (min … max):   152.165 s … 153.542 s    3 runs

        Benchmark 6: update-ref: create many refs (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):     51.912 s ±  0.127 s    [User: 26.483 s, System: 25.424 s]
          Range (min … max):   51.769 s … 52.012 s    3 runs

  - Deleting a ref in a fully-packed repository shows that the "files"
    backend scales with the number of refs. The "reftable" backend has
    constant-time deletions.

        Benchmark 1: update-ref: delete ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.7 ms ±   0.1 ms    [User: 0.4 ms, System: 1.2 ms]
          Range (min … max):     1.6 ms …   2.1 ms    316 runs

        Benchmark 2: update-ref: delete ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.8 ms ±   0.1 ms    [User: 0.4 ms, System: 1.3 ms]
          Range (min … max):     1.7 ms …   2.1 ms    294 runs

        Benchmark 3: update-ref: delete ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       2.0 ms ±   0.1 ms    [User: 0.5 ms, System: 1.4 ms]
          Range (min … max):     1.9 ms …   2.5 ms    287 runs

        Benchmark 4: update-ref: delete ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.9 ms ±   0.1 ms    [User: 0.5 ms, System: 1.3 ms]
          Range (min … max):     1.8 ms …   2.1 ms    217 runs

        Benchmark 5: update-ref: delete ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):     229.8 ms ±   7.9 ms    [User: 182.6 ms, System: 46.8 ms]
          Range (min … max):   224.6 ms … 245.2 ms    6 runs

        Benchmark 6: update-ref: delete ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       2.0 ms ±   0.0 ms    [User: 0.6 ms, System: 1.3 ms]
          Range (min … max):     2.0 ms …   2.1 ms    3 runs

  - Listing all refs shows no significant advantage for either of the
    backends. The "files" backend is a bit faster, but not by a
    significant margin. When repositories are not packed the "reftable"
    backend outperforms the "files" backend because the "reftable"
    backend performs auto-compaction.

        Benchmark 1: show-ref: print all refs (refformat = files, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1729 runs

        Benchmark 2: show-ref: print all refs (refformat = reftable, refcount = 1, packed = true)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.8 ms    1816 runs

        Benchmark 3: show-ref: print all refs (refformat = files, refcount = 1000, packed = true)
          Time (mean ± σ):       4.3 ms ±   0.1 ms    [User: 0.9 ms, System: 3.3 ms]
          Range (min … max):     4.1 ms …   4.6 ms    645 runs

        Benchmark 4: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = true)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.0 ms, System: 3.3 ms]
          Range (min … max):     4.2 ms …   5.9 ms    643 runs

        Benchmark 5: show-ref: print all refs (refformat = files, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.537 s ±  0.034 s    [User: 0.488 s, System: 2.048 s]
          Range (min … max):    2.511 s …  2.627 s    10 runs

        Benchmark 6: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = true)
          Time (mean ± σ):      2.712 s ±  0.017 s    [User: 0.653 s, System: 2.059 s]
          Range (min … max):    2.692 s …  2.752 s    10 runs

        Benchmark 7: show-ref: print all refs (refformat = files, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   1.9 ms    1834 runs

        Benchmark 8: show-ref: print all refs (refformat = reftable, refcount = 1, packed = false)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.0 ms    1840 runs

        Benchmark 9: show-ref: print all refs (refformat = files, refcount = 1000, packed = false)
          Time (mean ± σ):      13.8 ms ±   0.2 ms    [User: 2.8 ms, System: 10.8 ms]
          Range (min … max):    13.3 ms …  14.5 ms    208 runs

        Benchmark 10: show-ref: print all refs (refformat = reftable, refcount = 1000, packed = false)
          Time (mean ± σ):       4.5 ms ±   0.2 ms    [User: 1.2 ms, System: 3.3 ms]
          Range (min … max):     4.3 ms …   6.2 ms    624 runs

        Benchmark 11: show-ref: print all refs (refformat = files, refcount = 1000000, packed = false)
          Time (mean ± σ):     12.127 s ±  0.129 s    [User: 2.675 s, System: 9.451 s]
          Range (min … max):   11.965 s … 12.370 s    10 runs

        Benchmark 12: show-ref: print all refs (refformat = reftable, refcount = 1000000, packed = false)
          Time (mean ± σ):      2.799 s ±  0.022 s    [User: 0.735 s, System: 2.063 s]
          Range (min … max):    2.769 s …  2.836 s    10 runs

  - Printing a single ref shows no real difference between the "files"
    and "reftable" backends.

        Benchmark 1: show-ref: print single ref (refformat = files, refcount = 1)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.4 ms, System: 1.0 ms]
          Range (min … max):     1.4 ms …   1.8 ms    1779 runs

        Benchmark 2: show-ref: print single ref (refformat = reftable, refcount = 1)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   2.5 ms    1753 runs

        Benchmark 3: show-ref: print single ref (refformat = files, refcount = 1000)
          Time (mean ± σ):       1.5 ms ±   0.1 ms    [User: 0.3 ms, System: 1.1 ms]
          Range (min … max):     1.4 ms …   1.9 ms    1840 runs

        Benchmark 4: show-ref: print single ref (refformat = reftable, refcount = 1000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.0 ms    1831 runs

        Benchmark 5: show-ref: print single ref (refformat = files, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1848 runs

        Benchmark 6: show-ref: print single ref (refformat = reftable, refcount = 1000000)
          Time (mean ± σ):       1.6 ms ±   0.1 ms    [User: 0.4 ms, System: 1.1 ms]
          Range (min … max):     1.5 ms …   2.1 ms    1762 runs

So overall, performance depends on the usecases. Except for many
sequential writes the "reftable" backend is roughly on par or
significantly faster than the "files" backend though. Given that the
"files" backend has received 18 years of optimizations by now this can
be seen as a win. Furthermore, we can expect that the "reftable" backend
will grow faster over time when attention turns more towards
optimizations.

The complete test suite passes, except for those tests explicitly marked
to require the REFFILES prerequisite. Some tests in t0610 are marked as
failing because they depend on still-in-flight bug fixes. Tests can be
run with the new backend by setting the GIT_TEST_DEFAULT_REF_FORMAT
environment variable to "reftable".

There is a single known conceptual incompatibility with the dumb HTTP
transport. As "info/refs" SHOULD NOT contain the HEAD reference, and
because the "HEAD" file is not valid anymore, it is impossible for the
remote client to figure out the default branch without changing the
protocol. This shortcoming needs to be handled in a subsequent patch
series.

As the reftable library has already been introduced a while ago, this
commit message will not go into the details of how exactly the on-disk
format works. Please refer to our preexisting technical documentation at
Documentation/technical/reftable for this.

[1]: https://public-inbox.org/git/CAJo=hJtyof=HRy=2sLP0ng0uZ4=S-DpZ5dR1aF+VHVETKG20OQ@mail.gmail.com/

Original-idea-by: Shawn Pearce <spearce@spearce.org>
Based-on-patch-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2024-02-07 08:28:37 -08:00

250 lines
7.9 KiB
C

#ifndef PATH_H
#define PATH_H
struct repository;
struct strbuf;
struct string_list;
/*
* The result to all functions which return statically allocated memory may be
* overwritten by another call to _any_ one of these functions. Consider using
* the safer variants which operate on strbufs or return allocated memory.
*/
/*
* Return a statically allocated path.
*/
const char *mkpath(const char *fmt, ...)
__attribute__((format (printf, 1, 2)));
/*
* Return a path.
*/
char *mkpathdup(const char *fmt, ...)
__attribute__((format (printf, 1, 2)));
/*
* Construct a path and place the result in the provided buffer `buf`.
*/
char *mksnpath(char *buf, size_t n, const char *fmt, ...)
__attribute__((format (printf, 3, 4)));
/*
* The `git_common_path` family of functions will construct a path into a
* repository's common git directory, which is shared by all worktrees.
*/
/*
* Constructs a path into the common git directory of repository `repo` and
* append it in the provided buffer `sb`.
*/
void strbuf_git_common_path(struct strbuf *sb,
const struct repository *repo,
const char *fmt, ...)
__attribute__((format (printf, 3, 4)));
/*
* Return a statically allocated path into the main repository's
* (the_repository) common git directory.
*/
const char *git_common_path(const char *fmt, ...)
__attribute__((format (printf, 1, 2)));
/*
* The `git_path` family of functions will construct a path into a repository's
* git directory.
*
* These functions will perform adjustments to the resultant path to account
* for special paths which are either considered common among worktrees (e.g.
* paths into the object directory) or have been explicitly set via an
* environment variable or config (e.g. path to the index file).
*
* For an exhaustive list of the adjustments made look at `common_list` and
* `adjust_git_path` in path.c.
*/
/*
* Return a path into the git directory of repository `repo`.
*/
char *repo_git_path(const struct repository *repo,
const char *fmt, ...)
__attribute__((format (printf, 2, 3)));
/*
* Construct a path into the git directory of repository `repo` and append it
* to the provided buffer `sb`.
*/
void strbuf_repo_git_path(struct strbuf *sb,
const struct repository *repo,
const char *fmt, ...)
__attribute__((format (printf, 3, 4)));
/*
* Return a statically allocated path into the main repository's
* (the_repository) git directory.
*/
const char *git_path(const char *fmt, ...)
__attribute__((format (printf, 1, 2)));
/*
* Return a path into the main repository's (the_repository) git directory.
*/
char *git_pathdup(const char *fmt, ...)
__attribute__((format (printf, 1, 2)));
/*
* Construct a path into the main repository's (the_repository) git directory
* and place it in the provided buffer `buf`, the contents of the buffer will
* be overridden.
*/
char *git_path_buf(struct strbuf *buf, const char *fmt, ...)
__attribute__((format (printf, 2, 3)));
/*
* Construct a path into the main repository's (the_repository) git directory
* and append it to the provided buffer `sb`.
*/
void strbuf_git_path(struct strbuf *sb, const char *fmt, ...)
__attribute__((format (printf, 2, 3)));
/*
* Return a path into the worktree of repository `repo`.
*
* If the repository doesn't have a worktree NULL is returned.
*/
char *repo_worktree_path(const struct repository *repo,
const char *fmt, ...)
__attribute__((format (printf, 2, 3)));
/*
* Construct a path into the worktree of repository `repo` and append it
* to the provided buffer `sb`.
*
* If the repository doesn't have a worktree nothing will be appended to `sb`.
*/
void strbuf_repo_worktree_path(struct strbuf *sb,
const struct repository *repo,
const char *fmt, ...)
__attribute__((format (printf, 3, 4)));
/*
* Return a path into a submodule's git directory located at `path`. `path`
* must only reference a submodule of the main repository (the_repository).
*/
char *git_pathdup_submodule(const char *path, const char *fmt, ...)
__attribute__((format (printf, 2, 3)));
/*
* Construct a path into a submodule's git directory located at `path` and
* append it to the provided buffer `sb`. `path` must only reference a
* submodule of the main repository (the_repository).
*/
int strbuf_git_path_submodule(struct strbuf *sb, const char *path,
const char *fmt, ...)
__attribute__((format (printf, 3, 4)));
void report_linked_checkout_garbage(void);
/*
* You can define a static memoized git path like:
*
* static GIT_PATH_FUNC(git_path_foo, "FOO")
*
* or use one of the global ones below.
*/
#define GIT_PATH_FUNC(func, filename) \
const char *func(void) \
{ \
static char *ret; \
if (!ret) \
ret = git_pathdup(filename); \
return ret; \
}
#define REPO_GIT_PATH_FUNC(var, filename) \
const char *git_path_##var(struct repository *r) \
{ \
if (!r->cached_paths.var) \
r->cached_paths.var = repo_git_path(r, filename); \
return r->cached_paths.var; \
}
const char *git_path_squash_msg(struct repository *r);
const char *git_path_merge_msg(struct repository *r);
const char *git_path_merge_rr(struct repository *r);
const char *git_path_merge_mode(struct repository *r);
const char *git_path_merge_head(struct repository *r);
const char *git_path_fetch_head(struct repository *r);
const char *git_path_shallow(struct repository *r);
int ends_with_path_components(const char *path, const char *components);
int validate_headref(const char *ref);
int calc_shared_perm(int mode);
int adjust_shared_perm(const char *path);
char *interpolate_path(const char *path, int real_home);
const char *enter_repo(const char *path, int strict);
const char *remove_leading_path(const char *in, const char *prefix);
const char *relative_path(const char *in, const char *prefix, struct strbuf *sb);
int normalize_path_copy_len(char *dst, const char *src, int *prefix_len);
int normalize_path_copy(char *dst, const char *src);
/**
* Normalize in-place the path contained in the strbuf. If an error occurs,
* the contents of "sb" are left untouched, and -1 is returned.
*/
int strbuf_normalize_path(struct strbuf *src);
int longest_ancestor_length(const char *path, struct string_list *prefixes);
char *strip_path_suffix(const char *path, const char *suffix);
int daemon_avoid_alias(const char *path);
/*
* These functions match their is_hfs_dotgit() counterparts; see utf8.h for
* details.
*/
int is_ntfs_dotgit(const char *name);
int is_ntfs_dotgitmodules(const char *name);
int is_ntfs_dotgitignore(const char *name);
int is_ntfs_dotgitattributes(const char *name);
int is_ntfs_dotmailmap(const char *name);
/*
* Returns true iff "str" could be confused as a command-line option when
* passed to a sub-program like "ssh". Note that this has nothing to do with
* shell-quoting, which should be handled separately; we're assuming here that
* the string makes it verbatim to the sub-program.
*/
int looks_like_command_line_option(const char *str);
/**
* Return a newly allocated string with the evaluation of
* "$XDG_CONFIG_HOME/$subdir/$filename" if $XDG_CONFIG_HOME is non-empty, otherwise
* "$HOME/.config/$subdir/$filename". Return NULL upon error.
*/
char *xdg_config_home_for(const char *subdir, const char *filename);
/**
* Return a newly allocated string with the evaluation of
* "$XDG_CONFIG_HOME/git/$filename" if $XDG_CONFIG_HOME is non-empty, otherwise
* "$HOME/.config/git/$filename". Return NULL upon error.
*/
char *xdg_config_home(const char *filename);
/**
* Return a newly allocated string with the evaluation of
* "$XDG_CACHE_HOME/git/$filename" if $XDG_CACHE_HOME is non-empty, otherwise
* "$HOME/.cache/git/$filename". Return NULL upon error.
*/
char *xdg_cache_home(const char *filename);
/*
* Create a directory and (if share is nonzero) adjust its permissions
* according to the shared_repository setting. Only use this for
* directories under $GIT_DIR. Don't use it for working tree
* directories.
*/
void safe_create_dir(const char *dir, int share);
#endif /* PATH_H */