From 839a66349e094a28f29577a8b311491b9d6b907b Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:38 +0000 Subject: [PATCH 01/26] sparse-index: API protection strategy Edit and expand the sparse-index design document with the plan for guarding index operations with ensure_full_index(). Notably, the plan has changed to not have an expand_to_path() method in favor of checking for a sparse-directory hit inside of the index_path_pos() API. The changes that follow this one will incrementally add ensure_full_index() guards to iterations over all cache entries. Some iterations over the cache entries are not protected due to a few categories listed in the document. Since these are not being modified, here is a short list of the files and methods that will not receive these guards: Looking for non-zero stage: * builtin/add.c:chmod_pathspec() * builtin/merge.c:count_unmerged_entries() * merge-ort.c:record_conflicted_index_entries() * read-cache.c:unmerged_index() * rerere.c:check_one_conflict(), find_conflict(), rerere_remaining() * revision.c:prepare_show_merge() * sequencer.c:append_conflicts_hint() * wt-status.c:wt_status_collect_changes_initial() Looking for submodules: * builtin/submodule--helper.c:module_list_compute() * submodule.c: several methods * worktree.c:validate_no_submodules() Part of the index API: * name-hash.c: lazy init methods * preload-index.c:preload_thread(), preload_index() * read-cache.c: file format methods Checking for correct order of cache entries: * read-cache.c:check_ce_order() Ignores SKIP_WORKTREE entries or already aware: * unpack-trees.c:mark_new_skip_worktree() * wt-status.c:wt_status_check_sparse_checkout() Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- Documentation/technical/sparse-index.txt | 37 ++++++++++++++++++++++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/Documentation/technical/sparse-index.txt b/Documentation/technical/sparse-index.txt index 8d3d8080460..3b24c1a219f 100644 --- a/Documentation/technical/sparse-index.txt +++ b/Documentation/technical/sparse-index.txt @@ -85,8 +85,41 @@ index, as well. Next, consumers of the index will be guarded against operating on a sparse-index by inserting calls to `ensure_full_index()` or -`expand_index_to_path()`. After these guards are in place, we can begin -leaving sparse-directory entries in the in-memory index structure. +`expand_index_to_path()`. If a specific path is requested, then those will +be protected from within the `index_file_exists()` and `index_name_pos()` +API calls: they will call `ensure_full_index()` if necessary. The +intention here is to preserve existing behavior when interacting with a +sparse-checkout. We don't want a change to happen by accident, without +tests. Many of these locations may not need any change before removing the +guards, but we should not do so without tests to ensure the expected +behavior happens. + +It may be desirable to _change_ the behavior of some commands in the +presence of a sparse index or more generally in any sparse-checkout +scenario. In such cases, these should be carefully communicated and +tested. No such behavior changes are intended during this phase. + +During a scan of the codebase, not every iteration of the cache entries +needs an `ensure_full_index()` check. The basic reasons include: + +1. The loop is scanning for entries with non-zero stage. These entries + are not collapsed into a sparse-directory entry. + +2. The loop is scanning for submodules. These entries are not collapsed + into a sparse-directory entry. + +3. The loop is part of the index API, especially around reading or + writing the format. + +4. The loop is checking for correct order of cache entries and that is + correct if and only if the sparse-directory entries are in the correct + location. + +5. The loop ignores entries with the `SKIP_WORKTREE` bit set, or is + otherwise already aware of sparse directory entries. + +6. The sparse-index is disabled at this point when using the split-index + feature, so no effort is made to protect the split-index API. Even after inserting these guards, we will keep expanding sparse-indexes for most Git commands using the `command_requires_full_index` repository From 847a9e5d4f876646e128c89f0818b1a8ce792509 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:39 +0000 Subject: [PATCH 02/26] *: remove 'const' qualifier for struct index_state Several methods specify that they take a 'struct index_state' pointer with the 'const' qualifier because they intend to only query the data, not change it. However, we will be introducing a step very low in the method stack that might modify a sparse-index to become a full index in the case that our queries venture inside a sparse-directory entry. This change only removes the 'const' qualifiers that are necessary for the following change which will actually modify the implementation of index_name_stage_pos(). Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- attr.c | 14 +++++++------- attr.h | 4 ++-- builtin/ls-files.c | 10 +++++----- cache.h | 6 +++--- convert.c | 26 +++++++++++++------------- convert.h | 20 ++++++++++---------- dir.c | 12 ++++++------ dir.h | 8 ++++---- merge-recursive.c | 2 +- pathspec.c | 6 +++--- pathspec.h | 6 +++--- read-cache.c | 10 +++++----- submodule.c | 6 +++--- submodule.h | 6 +++--- 14 files changed, 68 insertions(+), 68 deletions(-) diff --git a/attr.c b/attr.c index 4ef85d668b5..8de553293ee 100644 --- a/attr.c +++ b/attr.c @@ -718,7 +718,7 @@ static struct attr_stack *read_attr_from_file(const char *path, int macro_ok) return res; } -static struct attr_stack *read_attr_from_index(const struct index_state *istate, +static struct attr_stack *read_attr_from_index(struct index_state *istate, const char *path, int macro_ok) { @@ -748,7 +748,7 @@ static struct attr_stack *read_attr_from_index(const struct index_state *istate, return res; } -static struct attr_stack *read_attr(const struct index_state *istate, +static struct attr_stack *read_attr(struct index_state *istate, const char *path, int macro_ok) { struct attr_stack *res = NULL; @@ -840,7 +840,7 @@ static void push_stack(struct attr_stack **attr_stack_p, } } -static void bootstrap_attr_stack(const struct index_state *istate, +static void bootstrap_attr_stack(struct index_state *istate, struct attr_stack **stack) { struct attr_stack *e; @@ -878,7 +878,7 @@ static void bootstrap_attr_stack(const struct index_state *istate, push_stack(stack, e, NULL, 0); } -static void prepare_attr_stack(const struct index_state *istate, +static void prepare_attr_stack(struct index_state *istate, const char *path, int dirlen, struct attr_stack **stack) { @@ -1078,7 +1078,7 @@ static void determine_macros(struct all_attrs_item *all_attrs, * If check->check_nr is non-zero, only attributes in check[] are collected. * Otherwise all attributes are collected. */ -static void collect_some_attrs(const struct index_state *istate, +static void collect_some_attrs(struct index_state *istate, const char *path, struct attr_check *check) { @@ -1107,7 +1107,7 @@ static void collect_some_attrs(const struct index_state *istate, fill(path, pathlen, basename_offset, check->stack, check->all_attrs, rem); } -void git_check_attr(const struct index_state *istate, +void git_check_attr(struct index_state *istate, const char *path, struct attr_check *check) { @@ -1124,7 +1124,7 @@ void git_check_attr(const struct index_state *istate, } } -void git_all_attrs(const struct index_state *istate, +void git_all_attrs(struct index_state *istate, const char *path, struct attr_check *check) { int i; diff --git a/attr.h b/attr.h index 404548f028a..3732505edae 100644 --- a/attr.h +++ b/attr.h @@ -190,14 +190,14 @@ void attr_check_free(struct attr_check *check); */ const char *git_attr_name(const struct git_attr *); -void git_check_attr(const struct index_state *istate, +void git_check_attr(struct index_state *istate, const char *path, struct attr_check *check); /* * Retrieve all attributes that apply to the specified path. * check holds the attributes and their values. */ -void git_all_attrs(const struct index_state *istate, +void git_all_attrs(struct index_state *istate, const char *path, struct attr_check *check); enum git_attr_direction { diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 60a2913a01e..4f9ed1fb29b 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -57,7 +57,7 @@ static const char *tag_modified = ""; static const char *tag_skip_worktree = ""; static const char *tag_resolve_undo = ""; -static void write_eolinfo(const struct index_state *istate, +static void write_eolinfo(struct index_state *istate, const struct cache_entry *ce, const char *path) { if (show_eol) { @@ -122,7 +122,7 @@ static void print_debug(const struct cache_entry *ce) } } -static void show_dir_entry(const struct index_state *istate, +static void show_dir_entry(struct index_state *istate, const char *tag, struct dir_entry *ent) { int len = max_prefix_len; @@ -139,7 +139,7 @@ static void show_dir_entry(const struct index_state *istate, write_name(ent->name); } -static void show_other_files(const struct index_state *istate, +static void show_other_files(struct index_state *istate, const struct dir_struct *dir) { int i; @@ -152,7 +152,7 @@ static void show_other_files(const struct index_state *istate, } } -static void show_killed_files(const struct index_state *istate, +static void show_killed_files(struct index_state *istate, const struct dir_struct *dir) { int i; @@ -254,7 +254,7 @@ static void show_ce(struct repository *repo, struct dir_struct *dir, } } -static void show_ru_info(const struct index_state *istate) +static void show_ru_info(struct index_state *istate) { struct string_list_item *item; diff --git a/cache.h b/cache.h index 8aede373aeb..5006278c13c 100644 --- a/cache.h +++ b/cache.h @@ -800,7 +800,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na * index_name_pos(&index, "f", 1) -> -3 * index_name_pos(&index, "g", 1) -> -5 */ -int index_name_pos(const struct index_state *, const char *name, int namelen); +int index_name_pos(struct index_state *, const char *name, int namelen); /* * Some functions return the negative complement of an insert position when a @@ -850,8 +850,8 @@ int add_file_to_index(struct index_state *, const char *path, int flags); int chmod_index_entry(struct index_state *, struct cache_entry *ce, char flip); int ce_same_name(const struct cache_entry *a, const struct cache_entry *b); void set_object_name_for_intent_to_add_entry(struct cache_entry *ce); -int index_name_is_other(const struct index_state *, const char *, int); -void *read_blob_data_from_index(const struct index_state *, const char *, unsigned long *); +int index_name_is_other(struct index_state *, const char *, int); +void *read_blob_data_from_index(struct index_state *, const char *, unsigned long *); /* do stat comparison even if CE_VALID is true */ #define CE_MATCH_IGNORE_VALID 01 diff --git a/convert.c b/convert.c index ee360c2f07c..20b19abffa3 100644 --- a/convert.c +++ b/convert.c @@ -138,7 +138,7 @@ static const char *gather_convert_stats_ascii(const char *data, unsigned long si } } -const char *get_cached_convert_stats_ascii(const struct index_state *istate, +const char *get_cached_convert_stats_ascii(struct index_state *istate, const char *path) { const char *ret; @@ -222,7 +222,7 @@ static void check_global_conv_flags_eol(const char *path, } } -static int has_crlf_in_index(const struct index_state *istate, const char *path) +static int has_crlf_in_index(struct index_state *istate, const char *path) { unsigned long sz; void *data; @@ -496,7 +496,7 @@ static int encode_to_worktree(const char *path, const char *src, size_t src_len, return 1; } -static int crlf_to_git(const struct index_state *istate, +static int crlf_to_git(struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *buf, enum crlf_action crlf_action, int conv_flags) @@ -1307,7 +1307,7 @@ struct conv_attrs { static struct attr_check *check; -static void convert_attrs(const struct index_state *istate, +static void convert_attrs(struct index_state *istate, struct conv_attrs *ca, const char *path) { struct attr_check_item *ccheck = NULL; @@ -1369,7 +1369,7 @@ void reset_parsed_attributes(void) user_convert_tail = NULL; } -int would_convert_to_git_filter_fd(const struct index_state *istate, const char *path) +int would_convert_to_git_filter_fd(struct index_state *istate, const char *path) { struct conv_attrs ca; @@ -1388,7 +1388,7 @@ int would_convert_to_git_filter_fd(const struct index_state *istate, const char return apply_filter(path, NULL, 0, -1, NULL, ca.drv, CAP_CLEAN, NULL, NULL); } -const char *get_convert_attr_ascii(const struct index_state *istate, const char *path) +const char *get_convert_attr_ascii(struct index_state *istate, const char *path) { struct conv_attrs ca; @@ -1414,7 +1414,7 @@ const char *get_convert_attr_ascii(const struct index_state *istate, const char return ""; } -int convert_to_git(const struct index_state *istate, +int convert_to_git(struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst, int conv_flags) { @@ -1448,7 +1448,7 @@ int convert_to_git(const struct index_state *istate, return ret | ident_to_git(src, len, dst, ca.ident); } -void convert_to_git_filter_fd(const struct index_state *istate, +void convert_to_git_filter_fd(struct index_state *istate, const char *path, int fd, struct strbuf *dst, int conv_flags) { @@ -1466,7 +1466,7 @@ void convert_to_git_filter_fd(const struct index_state *istate, ident_to_git(dst->buf, dst->len, dst, ca.ident); } -static int convert_to_working_tree_internal(const struct index_state *istate, +static int convert_to_working_tree_internal(struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst, int normalizing, @@ -1510,7 +1510,7 @@ static int convert_to_working_tree_internal(const struct index_state *istate, return ret | ret_filter; } -int async_convert_to_working_tree(const struct index_state *istate, +int async_convert_to_working_tree(struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst, const struct checkout_metadata *meta, @@ -1519,7 +1519,7 @@ int async_convert_to_working_tree(const struct index_state *istate, return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, dco); } -int convert_to_working_tree(const struct index_state *istate, +int convert_to_working_tree(struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst, const struct checkout_metadata *meta) @@ -1527,7 +1527,7 @@ int convert_to_working_tree(const struct index_state *istate, return convert_to_working_tree_internal(istate, path, src, len, dst, 0, meta, NULL); } -int renormalize_buffer(const struct index_state *istate, const char *path, +int renormalize_buffer(struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst) { int ret = convert_to_working_tree_internal(istate, path, src, len, dst, 1, NULL, NULL); @@ -1964,7 +1964,7 @@ static struct stream_filter *ident_filter(const struct object_id *oid) * Note that you would be crazy to set CRLF, smudge/clean or ident to a * large binary blob you would want us not to slurp into the memory! */ -struct stream_filter *get_stream_filter(const struct index_state *istate, +struct stream_filter *get_stream_filter(struct index_state *istate, const char *path, const struct object_id *oid) { diff --git a/convert.h b/convert.h index e29d1026a68..0f7c8a1f044 100644 --- a/convert.h +++ b/convert.h @@ -65,41 +65,41 @@ struct checkout_metadata { extern enum eol core_eol; extern char *check_roundtrip_encoding; -const char *get_cached_convert_stats_ascii(const struct index_state *istate, +const char *get_cached_convert_stats_ascii(struct index_state *istate, const char *path); const char *get_wt_convert_stats_ascii(const char *path); -const char *get_convert_attr_ascii(const struct index_state *istate, +const char *get_convert_attr_ascii(struct index_state *istate, const char *path); /* returns 1 if *dst was used */ -int convert_to_git(const struct index_state *istate, +int convert_to_git(struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst, int conv_flags); -int convert_to_working_tree(const struct index_state *istate, +int convert_to_working_tree(struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst, const struct checkout_metadata *meta); -int async_convert_to_working_tree(const struct index_state *istate, +int async_convert_to_working_tree(struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst, const struct checkout_metadata *meta, void *dco); int async_query_available_blobs(const char *cmd, struct string_list *available_paths); -int renormalize_buffer(const struct index_state *istate, +int renormalize_buffer(struct index_state *istate, const char *path, const char *src, size_t len, struct strbuf *dst); -static inline int would_convert_to_git(const struct index_state *istate, +static inline int would_convert_to_git(struct index_state *istate, const char *path) { return convert_to_git(istate, path, NULL, 0, NULL, 0); } /* Precondition: would_convert_to_git_filter_fd(path) == true */ -void convert_to_git_filter_fd(const struct index_state *istate, +void convert_to_git_filter_fd(struct index_state *istate, const char *path, int fd, struct strbuf *dst, int conv_flags); -int would_convert_to_git_filter_fd(const struct index_state *istate, +int would_convert_to_git_filter_fd(struct index_state *istate, const char *path); /* @@ -133,7 +133,7 @@ void reset_parsed_attributes(void); struct stream_filter; /* opaque */ -struct stream_filter *get_stream_filter(const struct index_state *istate, +struct stream_filter *get_stream_filter(struct index_state *istate, const char *path, const struct object_id *); void free_stream_filter(struct stream_filter *); diff --git a/dir.c b/dir.c index fd8aa7c40fa..5b00dfb5b14 100644 --- a/dir.c +++ b/dir.c @@ -306,7 +306,7 @@ static int do_read_blob(const struct object_id *oid, struct oid_stat *oid_stat, * [1] Only if DO_MATCH_DIRECTORY is passed; otherwise, this is NOT a match. * [2] Only if DO_MATCH_LEADING_PATHSPEC is passed; otherwise, not a match. */ -static int match_pathspec_item(const struct index_state *istate, +static int match_pathspec_item(struct index_state *istate, const struct pathspec_item *item, int prefix, const char *name, int namelen, unsigned flags) { @@ -429,7 +429,7 @@ static int match_pathspec_item(const struct index_state *istate, * pathspec did not match any names, which could indicate that the * user mistyped the nth pathspec. */ -static int do_match_pathspec(const struct index_state *istate, +static int do_match_pathspec(struct index_state *istate, const struct pathspec *ps, const char *name, int namelen, int prefix, char *seen, @@ -500,7 +500,7 @@ static int do_match_pathspec(const struct index_state *istate, return retval; } -static int match_pathspec_with_flags(const struct index_state *istate, +static int match_pathspec_with_flags(struct index_state *istate, const struct pathspec *ps, const char *name, int namelen, int prefix, char *seen, unsigned flags) @@ -516,7 +516,7 @@ static int match_pathspec_with_flags(const struct index_state *istate, return negative ? 0 : positive; } -int match_pathspec(const struct index_state *istate, +int match_pathspec(struct index_state *istate, const struct pathspec *ps, const char *name, int namelen, int prefix, char *seen, int is_dir) @@ -529,7 +529,7 @@ int match_pathspec(const struct index_state *istate, /** * Check if a submodule is a superset of the pathspec */ -int submodule_path_match(const struct index_state *istate, +int submodule_path_match(struct index_state *istate, const struct pathspec *ps, const char *submodule_name, char *seen) @@ -892,7 +892,7 @@ void add_pattern(const char *string, const char *base, add_pattern_to_hashsets(pl, pattern); } -static int read_skip_worktree_file_from_index(const struct index_state *istate, +static int read_skip_worktree_file_from_index(struct index_state *istate, const char *path, size_t *size_out, char **data_out, struct oid_stat *oid_stat) diff --git a/dir.h b/dir.h index facfae47402..51cb0e21724 100644 --- a/dir.h +++ b/dir.h @@ -354,7 +354,7 @@ int count_slashes(const char *s); int simple_length(const char *match); int no_wildcard(const char *string); char *common_prefix(const struct pathspec *pathspec); -int match_pathspec(const struct index_state *istate, +int match_pathspec(struct index_state *istate, const struct pathspec *pathspec, const char *name, int namelen, int prefix, char *seen, int is_dir); @@ -492,12 +492,12 @@ int git_fnmatch(const struct pathspec_item *item, const char *pattern, const char *string, int prefix); -int submodule_path_match(const struct index_state *istate, +int submodule_path_match(struct index_state *istate, const struct pathspec *ps, const char *submodule_name, char *seen); -static inline int ce_path_match(const struct index_state *istate, +static inline int ce_path_match(struct index_state *istate, const struct cache_entry *ce, const struct pathspec *pathspec, char *seen) @@ -506,7 +506,7 @@ static inline int ce_path_match(const struct index_state *istate, S_ISDIR(ce->ce_mode) || S_ISGITLINK(ce->ce_mode)); } -static inline int dir_path_match(const struct index_state *istate, +static inline int dir_path_match(struct index_state *istate, const struct dir_entry *ent, const struct pathspec *pathspec, int prefix, char *seen) diff --git a/merge-recursive.c b/merge-recursive.c index 3d9207455b3..b8de7a704ea 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -2987,7 +2987,7 @@ static int blob_unchanged(struct merge_options *opt, struct strbuf obuf = STRBUF_INIT; struct strbuf abuf = STRBUF_INIT; int ret = 0; /* assume changed for safety */ - const struct index_state *idx = opt->repo->index; + struct index_state *idx = opt->repo->index; if (a->mode != o->mode) return 0; diff --git a/pathspec.c b/pathspec.c index 7a229d8d22f..b6e333965cb 100644 --- a/pathspec.c +++ b/pathspec.c @@ -20,7 +20,7 @@ * to use find_pathspecs_matching_against_index() instead. */ void add_pathspec_matches_against_index(const struct pathspec *pathspec, - const struct index_state *istate, + struct index_state *istate, char *seen) { int num_unmatched = 0, i; @@ -51,7 +51,7 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec, * given pathspecs achieves against all items in the index. */ char *find_pathspecs_matching_against_index(const struct pathspec *pathspec, - const struct index_state *istate) + struct index_state *istate) { char *seen = xcalloc(pathspec->nr, 1); add_pathspec_matches_against_index(pathspec, istate, seen); @@ -702,7 +702,7 @@ void clear_pathspec(struct pathspec *pathspec) pathspec->nr = 0; } -int match_pathspec_attrs(const struct index_state *istate, +int match_pathspec_attrs(struct index_state *istate, const char *name, int namelen, const struct pathspec_item *item) { diff --git a/pathspec.h b/pathspec.h index 454ce364fac..2ccc8080b6c 100644 --- a/pathspec.h +++ b/pathspec.h @@ -150,11 +150,11 @@ static inline int ps_strcmp(const struct pathspec_item *item, } void add_pathspec_matches_against_index(const struct pathspec *pathspec, - const struct index_state *istate, + struct index_state *istate, char *seen); char *find_pathspecs_matching_against_index(const struct pathspec *pathspec, - const struct index_state *istate); -int match_pathspec_attrs(const struct index_state *istate, + struct index_state *istate); +int match_pathspec_attrs(struct index_state *istate, const char *name, int namelen, const struct pathspec_item *item); diff --git a/read-cache.c b/read-cache.c index 2410e6e0df1..3ad94578095 100644 --- a/read-cache.c +++ b/read-cache.c @@ -549,7 +549,7 @@ int cache_name_stage_compare(const char *name1, int len1, int stage1, const char return 0; } -static int index_name_stage_pos(const struct index_state *istate, const char *name, int namelen, int stage) +static int index_name_stage_pos(struct index_state *istate, const char *name, int namelen, int stage) { int first, last; @@ -570,7 +570,7 @@ static int index_name_stage_pos(const struct index_state *istate, const char *na return -first-1; } -int index_name_pos(const struct index_state *istate, const char *name, int namelen) +int index_name_pos(struct index_state *istate, const char *name, int namelen) { return index_name_stage_pos(istate, name, namelen, 0); } @@ -3389,8 +3389,8 @@ int repo_read_index_unmerged(struct repository *repo) * We helpfully remove a trailing "/" from directories so that * the output of read_directory can be used as-is. */ -int index_name_is_other(const struct index_state *istate, const char *name, - int namelen) +int index_name_is_other(struct index_state *istate, const char *name, + int namelen) { int pos; if (namelen && name[namelen - 1] == '/') @@ -3408,7 +3408,7 @@ int index_name_is_other(const struct index_state *istate, const char *name, return 1; } -void *read_blob_data_from_index(const struct index_state *istate, +void *read_blob_data_from_index(struct index_state *istate, const char *path, unsigned long *size) { int pos, len; diff --git a/submodule.c b/submodule.c index 9767ba9893c..83809a4f7bd 100644 --- a/submodule.c +++ b/submodule.c @@ -33,7 +33,7 @@ static struct oid_array ref_tips_after_fetch; * will be disabled because we can't guess what might be configured in * .gitmodules unless the user resolves the conflict. */ -int is_gitmodules_unmerged(const struct index_state *istate) +int is_gitmodules_unmerged(struct index_state *istate) { int pos = index_name_pos(istate, GITMODULES_FILE, strlen(GITMODULES_FILE)); if (pos < 0) { /* .gitmodules not found or isn't merged */ @@ -301,7 +301,7 @@ int is_submodule_populated_gently(const char *path, int *return_error_code) /* * Dies if the provided 'prefix' corresponds to an unpopulated submodule */ -void die_in_unpopulated_submodule(const struct index_state *istate, +void die_in_unpopulated_submodule(struct index_state *istate, const char *prefix) { int i, prefixlen; @@ -331,7 +331,7 @@ void die_in_unpopulated_submodule(const struct index_state *istate, /* * Dies if any paths in the provided pathspec descends into a submodule */ -void die_path_inside_submodule(const struct index_state *istate, +void die_path_inside_submodule(struct index_state *istate, const struct pathspec *ps) { int i, j; diff --git a/submodule.h b/submodule.h index 4ac6e31cf1f..84640c49c11 100644 --- a/submodule.h +++ b/submodule.h @@ -39,7 +39,7 @@ struct submodule_update_strategy { }; #define SUBMODULE_UPDATE_STRATEGY_INIT {SM_UPDATE_UNSPECIFIED, NULL} -int is_gitmodules_unmerged(const struct index_state *istate); +int is_gitmodules_unmerged(struct index_state *istate); int is_writing_gitmodules_ok(void); int is_staging_gitmodules_ok(struct index_state *istate); int update_path_in_gitmodules(const char *oldpath, const char *newpath); @@ -60,9 +60,9 @@ int is_submodule_active(struct repository *repo, const char *path); * Otherwise the return error code is the same as of resolve_gitdir_gently. */ int is_submodule_populated_gently(const char *path, int *return_error_code); -void die_in_unpopulated_submodule(const struct index_state *istate, +void die_in_unpopulated_submodule(struct index_state *istate, const char *prefix); -void die_path_inside_submodule(const struct index_state *istate, +void die_path_inside_submodule(struct index_state *istate, const struct pathspec *ps); enum submodule_update_type parse_submodule_update_type(const char *value); int parse_submodule_update_strategy(const char *value, From 95e0321c4dbb81eca5dc1c6f96b176b00a0368d7 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:40 +0000 Subject: [PATCH 03/26] read-cache: expand on query into sparse-directory entry Callers to index_name_pos() or index_name_stage_pos() have a specific path in mind. If that happens to be a path with an ancestor being a sparse-directory entry, it can lead to unexpected results. In the case that we did not find the requested path, check to see if the position _before_ the inserted position is a sparse directory entry that matches the initial segment of the input path (including the directory separator at the end of the directory name). If so, then expand the index to be a full index and search again. This expansion will only happen once per index read. Future enhancements could be more careful to expand only the necessary sparse directory entry, but then we would have a special "not fully sparse, but also not fully expanded" mode that could affect writing the index to file. Since this only occurs if a specific file is requested outside of the sparse checkout definition, this is unlikely to be a common situation. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- read-cache.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) diff --git a/read-cache.c b/read-cache.c index 3ad94578095..3698bc7bf77 100644 --- a/read-cache.c +++ b/read-cache.c @@ -567,6 +567,27 @@ static int index_name_stage_pos(struct index_state *istate, const char *name, in } first = next+1; } + + if (istate->sparse_index && + first > 0) { + /* Note: first <= istate->cache_nr */ + struct cache_entry *ce = istate->cache[first - 1]; + + /* + * If we are in a sparse-index _and_ the entry before the + * insertion position is a sparse-directory entry that is + * an ancestor of 'name', then we need to expand the index + * and search again. This will only trigger once, because + * thereafter the index is fully expanded. + */ + if (S_ISSPARSEDIR(ce->ce_mode) && + ce_namelen(ce) < namelen && + !strncmp(name, ce->name, ce_namelen(ce))) { + ensure_full_index(istate); + return index_name_stage_pos(istate, name, namelen, stage); + } + } + return -first-1; } From 118a2e8bde0982d219607ff9f260b9cfeb25585c Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:41 +0000 Subject: [PATCH 04/26] cache: move ensure_full_index() to cache.h Soon we will insert ensure_full_index() calls across the codebase. Instead of also adding include statements for sparse-index.h, let's just use the fact that anything that cares about the index already has cache.h in its includes. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- cache.h | 1 + sparse-index.h | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/cache.h b/cache.h index 5006278c13c..b7e20e9778d 100644 --- a/cache.h +++ b/cache.h @@ -350,6 +350,7 @@ void add_name_hash(struct index_state *istate, struct cache_entry *ce); void remove_name_hash(struct index_state *istate, struct cache_entry *ce); void free_name_hash(struct index_state *istate); +void ensure_full_index(struct index_state *istate); /* Cache entry creation and cleanup */ diff --git a/sparse-index.h b/sparse-index.h index 39dcc859735..0268f38753c 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -2,7 +2,6 @@ #define SPARSE_INDEX_H__ struct index_state; -void ensure_full_index(struct index_state *istate); int convert_to_sparse(struct index_state *istate); struct repository; From 54beed24d22867a98fc247e1031e3486573f1553 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:42 +0000 Subject: [PATCH 05/26] add: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/add.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/add.c b/builtin/add.c index ea762a41e3a..afccf2fd554 100644 --- a/builtin/add.c +++ b/builtin/add.c @@ -141,6 +141,8 @@ static int renormalize_tracked_files(const struct pathspec *pathspec, int flags) { int i, retval = 0; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (i = 0; i < active_nr; i++) { struct cache_entry *ce = active_cache[i]; From 1b850d37f42d58d1c4ad1454d80ecf33797bc467 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:43 +0000 Subject: [PATCH 06/26] checkout-index: ensure full index Before we iterate over all cache entries, ensure that the index is not sparse. This loop in checkout_all() might be safe to iterate over a sparse index, but let's put this protection here until it can be carefully tested. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/checkout-index.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/checkout-index.c b/builtin/checkout-index.c index 023e49e271c..2c2936a9dae 100644 --- a/builtin/checkout-index.c +++ b/builtin/checkout-index.c @@ -119,6 +119,8 @@ static void checkout_all(const char *prefix, int prefix_length) int i, errs = 0; struct cache_entry *last_ce = NULL; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (i = 0; i < active_nr ; i++) { struct cache_entry *ce = active_cache[i]; if (ce_stage(ce) != checkout_stage From 0f6d3ba6bd795b09bf216c267bbf6e3ec2409d1e Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:44 +0000 Subject: [PATCH 07/26] checkout: ensure full index Before iterating over all cache entries in the checkout builtin, ensure that we have a full index to avoid any unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/checkout.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/builtin/checkout.c b/builtin/checkout.c index 0e663905200..d0dbe63ea11 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -368,6 +368,9 @@ static int checkout_worktree(const struct checkout_opts *opts, NULL); enable_delayed_checkout(&state); + + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (pos = 0; pos < active_nr; pos++) { struct cache_entry *ce = active_cache[pos]; if (ce->ce_flags & CE_MATCHED) { @@ -512,6 +515,8 @@ static int checkout_paths(const struct checkout_opts *opts, * Make sure all pathspecs participated in locating the paths * to be checked out. */ + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (pos = 0; pos < active_nr; pos++) if (opts->overlay_mode) mark_ce_for_checkout_overlay(active_cache[pos], From cb8388df5b3252802a0149ca558f99307e42074c Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:45 +0000 Subject: [PATCH 08/26] commit: ensure full index These two loops iterate over all cache entries, so ensure that a sparse index is expanded to a full index before we do so. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/commit.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/builtin/commit.c b/builtin/commit.c index 739110c5a7f..cf0c36d1dcb 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -251,6 +251,8 @@ static int list_paths(struct string_list *list, const char *with_tree, free(max_prefix); } + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (i = 0; i < active_nr; i++) { const struct cache_entry *ce = active_cache[i]; struct string_list_item *item; @@ -931,6 +933,8 @@ static int prepare_to_commit(const char *index_file, const char *prefix, if (get_oid(parent, &oid)) { int i, ita_nr = 0; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (i = 0; i < active_nr; i++) if (ce_intent_to_add(active_cache[i])) ita_nr++; From 48b3c7da6c58752b0fbc73e891fd5b24c95281b1 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:46 +0000 Subject: [PATCH 09/26] difftool: ensure full index Before iterating over all cache entries, ensure that a sparse index has been expanded to a full one to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/difftool.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/builtin/difftool.c b/builtin/difftool.c index 6e18e623fdd..32c914dde6a 100644 --- a/builtin/difftool.c +++ b/builtin/difftool.c @@ -584,6 +584,9 @@ static int run_dir_diff(const char *extcmd, int symlinks, const char *prefix, setenv("GIT_DIFFTOOL_DIRDIFF", "true", 1); rc = run_command_v_opt(helper_argv, flags); + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&wtindex); + /* * If the diff includes working copy files and those * files were modified during the diff, then the changes From 2227ea175f9d2660c66f1bf15e2cd1ad75c9d4ca Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:47 +0000 Subject: [PATCH 10/26] fsck: ensure full index When verifying all blobs reachable from the index, ensure that a sparse index has been expanded to a full one to avoid missing some blobs. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/fsck.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/fsck.c b/builtin/fsck.c index 821e7798c70..4d7f5c63ce0 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -883,6 +883,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) verify_index_checksum = 1; verify_ce_order = 1; read_cache(); + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (i = 0; i < active_nr; i++) { unsigned int mode; struct blob *blob; From 46eb6e31ef01684ec2dc64f690a63446022940e5 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:48 +0000 Subject: [PATCH 11/26] grep: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full one so we do not miss blobs to scan. Later, this can integrate more carefully with sparse indexes with proper testing. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/grep.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/grep.c b/builtin/grep.c index 4e91a253ac3..c2d40414e97 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -504,6 +504,8 @@ static int grep_cache(struct grep_opt *opt, if (repo_read_index(repo) < 0) die(_("index file corrupt")); + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(repo->index); for (nr = 0; nr < repo->index->cache_nr; nr++) { const struct cache_entry *ce = repo->index->cache[nr]; From 42f44e84eb3be06f463ebc33baa2fc91423f4470 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:49 +0000 Subject: [PATCH 12/26] ls-files: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full one to avoid missing files. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/ls-files.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 4f9ed1fb29b..a0b4e54d114 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -317,6 +317,8 @@ static void show_files(struct repository *repo, struct dir_struct *dir) if (!(show_cached || show_stage || show_deleted || show_modified)) return; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(repo->index); for (i = 0; i < repo->index->cache_nr; i++) { const struct cache_entry *ce = repo->index->cache[i]; struct stat st; @@ -494,6 +496,8 @@ void overlay_tree_on_index(struct index_state *istate, die("bad tree-ish %s", tree_name); /* Hoist the unmerged entries up to stage #3 to make room */ + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(istate); for (i = 0; i < istate->cache_nr; i++) { struct cache_entry *ce = istate->cache[i]; if (!ce_stage(ce)) From 299e2c4561ba6eb99e367b1f73b8ffb3cead5efa Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:50 +0000 Subject: [PATCH 13/26] merge-index: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full one to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/merge-index.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/builtin/merge-index.c b/builtin/merge-index.c index 38ea6ad6ca2..c0383fe9df9 100644 --- a/builtin/merge-index.c +++ b/builtin/merge-index.c @@ -58,6 +58,8 @@ static void merge_one_path(const char *path) static void merge_all(void) { int i; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (i = 0; i < active_nr; i++) { const struct cache_entry *ce = active_cache[i]; if (!ce_stage(ce)) @@ -80,6 +82,9 @@ int cmd_merge_index(int argc, const char **argv, const char *prefix) read_cache(); + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); + i = 1; if (!strcmp(argv[i], "-o")) { one_shot = 1; From e43e2a17d2d4f63a277a7fdbc32453876c25e06c Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:51 +0000 Subject: [PATCH 14/26] rm: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/rm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/rm.c b/builtin/rm.c index 4858631e0f0..5559a0b453a 100644 --- a/builtin/rm.c +++ b/builtin/rm.c @@ -293,6 +293,8 @@ int cmd_rm(int argc, const char **argv, const char *prefix) seen = xcalloc(pathspec.nr, 1); + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (i = 0; i < active_nr; i++) { const struct cache_entry *ce = active_cache[i]; if (!ce_path_match(&the_index, ce, &pathspec, seen)) From a02912019a7a473b039d28713fb6419bf472386f Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:52 +0000 Subject: [PATCH 15/26] stash: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/stash.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/stash.c b/builtin/stash.c index ba774cce674..6fb7178ef2f 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -1350,6 +1350,8 @@ static int do_push_stash(const struct pathspec *ps, const char *stash_msg, int q int i; char *ps_matched = xcalloc(ps->nr, 1); + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (i = 0; i < active_nr; i++) ce_path_match(&the_index, active_cache[i], ps, ps_matched); From 2508df0272fdae3025b53366ac6b456920bc0de0 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:53 +0000 Subject: [PATCH 16/26] update-index: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- builtin/update-index.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/builtin/update-index.c b/builtin/update-index.c index 79087bccea4..f1f16f2de52 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -745,6 +745,8 @@ static int do_reupdate(int ac, const char **av, */ has_head = 0; redo: + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(&the_index); for (pos = 0; pos < active_nr; pos++) { const struct cache_entry *ce = active_cache[pos]; struct cache_entry *old = NULL; From d425f65127562e3a7f308b1ccfe385950f96e8ec Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:54 +0000 Subject: [PATCH 17/26] dir: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- dir.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/dir.c b/dir.c index 5b00dfb5b14..166238e79f5 100644 --- a/dir.c +++ b/dir.c @@ -3533,6 +3533,8 @@ static void connect_wt_gitdir_in_nested(const char *sub_worktree, if (repo_read_index(&subrepo) < 0) die(_("index file corrupt in repo %s"), subrepo.gitdir); + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(subrepo.index); for (i = 0; i < subrepo.index->cache_nr; i++) { const struct cache_entry *ce = subrepo.index->cache[i]; From 3450a304aaa20707a696176441a8bbfe6d5431a3 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:55 +0000 Subject: [PATCH 18/26] entry: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- entry.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/entry.c b/entry.c index 7b9f43716f7..891e4ba2b45 100644 --- a/entry.c +++ b/entry.c @@ -412,6 +412,8 @@ static void mark_colliding_entries(const struct checkout *state, ce->ce_flags |= CE_MATCHED; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(state->istate); for (i = 0; i < state->istate->cache_nr; i++) { struct cache_entry *dup = state->istate->cache[i]; From f7ef64be0cda36e5188cfc712f61ba7279311b70 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:56 +0000 Subject: [PATCH 19/26] merge-recursive: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- merge-recursive.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/merge-recursive.c b/merge-recursive.c index b8de7a704ea..91d8597728c 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -522,6 +522,8 @@ static struct string_list *get_unmerged(struct index_state *istate) unmerged->strdup_strings = 1; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(istate); for (i = 0; i < istate->cache_nr; i++) { struct string_list_item *item; struct stage_data *e; From 465a04abc6e5f4fe2b93c662bab0319c1667180e Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:57 +0000 Subject: [PATCH 20/26] pathspec: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- pathspec.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/pathspec.c b/pathspec.c index b6e333965cb..d67688bab74 100644 --- a/pathspec.c +++ b/pathspec.c @@ -36,6 +36,8 @@ void add_pathspec_matches_against_index(const struct pathspec *pathspec, num_unmatched++; if (!num_unmatched) return; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(istate); for (i = 0; i < istate->cache_nr; i++) { const struct cache_entry *ce = istate->cache[i]; ce_path_match(istate, ce, pathspec, seen); From 0c18c059a152058e30e23c4edcd24b3992599503 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:58 +0000 Subject: [PATCH 21/26] read-cache: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- read-cache.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/read-cache.c b/read-cache.c index 3698bc7bf77..a9dcf0ab4f7 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1577,6 +1577,8 @@ int refresh_index(struct index_state *istate, unsigned int flags, */ preload_index(istate, pathspec, 0); trace2_region_enter("index", "refresh", NULL); + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(istate); for (i = 0; i < istate->cache_nr; i++) { struct cache_entry *ce, *new_entry; int cache_errno = 0; @@ -2498,6 +2500,8 @@ int repo_index_has_changes(struct repository *repo, diff_flush(&opt); return opt.flags.has_changes != 0; } else { + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(istate); for (i = 0; sb && i < istate->cache_nr; i++) { if (i) strbuf_addch(sb, ' '); From dc26b23ebc30adcb4aa54a5eccf2e47793984fa2 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:49:59 +0000 Subject: [PATCH 22/26] resolve-undo: ensure full index Before iterating over all cache entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- resolve-undo.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/resolve-undo.c b/resolve-undo.c index 236320f179c..ee36ca58b13 100644 --- a/resolve-undo.c +++ b/resolve-undo.c @@ -172,6 +172,8 @@ void unmerge_marked_index(struct index_state *istate) if (!istate->resolve_undo) return; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(istate); for (i = 0; i < istate->cache_nr; i++) { const struct cache_entry *ce = istate->cache[i]; if (ce->ce_flags & CE_MATCHED) @@ -186,6 +188,8 @@ void unmerge_index(struct index_state *istate, const struct pathspec *pathspec) if (!istate->resolve_undo) return; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(istate); for (i = 0; i < istate->cache_nr; i++) { const struct cache_entry *ce = istate->cache[i]; if (!ce_path_match(istate, ce, pathspec, NULL)) From f5fed74fb2eac4d7c6fd4b09f2167b83942c0f13 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Thu, 1 Apr 2021 01:50:00 +0000 Subject: [PATCH 23/26] revision: ensure full index Before iterating over all index entries, ensure that a sparse index is expanded to a full index to avoid unexpected behavior. This case could be integrated later by ensuring that we walk the tree in the sparse-directory entry, but the current behavior is only expecting blobs. Save this integration for later when it can be properly tested. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- revision.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/revision.c b/revision.c index b78733f5089..b72e0ac1bdc 100644 --- a/revision.c +++ b/revision.c @@ -1680,6 +1680,8 @@ static void do_add_index_objects_to_pending(struct rev_info *revs, { int i; + /* TODO: audit for interaction with sparse-index. */ + ensure_full_index(istate); for (i = 0; i < istate->cache_nr; i++) { struct cache_entry *ce = istate->cache[i]; struct blob *blob; From 5f116695864788d1fe45ff06bfad7a71a8d98d0a Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 12 Apr 2021 21:08:15 +0000 Subject: [PATCH 24/26] name-hash: don't add directories to name_hash Sparse directory entries represent a directory that is outside the sparse-checkout definition. These are not paths to blobs, so should not be added to the name_hash table. Instead, they should be added to the directory hashtable when 'ignore_case' is true. Add a condition to avoid placing sparse directories into the name_hash hashtable. This avoids filling the table with extra entries that will never be queried. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- name-hash.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/name-hash.c b/name-hash.c index 4e03fac9bb1..d08deaa2c9e 100644 --- a/name-hash.c +++ b/name-hash.c @@ -109,8 +109,11 @@ static void hash_index_entry(struct index_state *istate, struct cache_entry *ce) if (ce->ce_flags & CE_HASHED) return; ce->ce_flags |= CE_HASHED; - hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce))); - hashmap_add(&istate->name_hash, &ce->ent); + + if (!S_ISSPARSEDIR(ce->ce_mode)) { + hashmap_entry_init(&ce->ent, memihash(ce->name, ce_namelen(ce))); + hashmap_add(&istate->name_hash, &ce->ent); + } if (ignore_case) add_dir_entry(istate, ce); From 71f82d032f34f7aa3708b4cdaeb12db617449606 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 12 Apr 2021 21:08:16 +0000 Subject: [PATCH 25/26] sparse-index: expand_to_path() Some users of the index API have a specific path they are looking for, but choose to use index_file_exists() to rely on the name-hash hashtable instead of doing binary search with index_name_pos(). These users only need to know a yes/no answer, not a position within the cache array. When the index is sparse, the name-hash hash table does not contain the full list of paths within sparse directories. It _does_ contain the directory names for the sparse-directory entries. Create a helper function, expand_to_path(), for intended use with the name-hash hashtable functions. The integration with name-hash.c will follow in a later change. The solution here is to use ensure_full_index() when we determine that the requested path is within a sparse directory entry. This will populate the name-hash hashtable as the index is recomputed from scratch. There may be cases where the caller is trying to find an untracked path that is not in the index but also is not within a sparse directory entry. We want to minimize the overhead for these requests. If we used index_name_pos() to find the insertion order of the path, then we could determine from that position if a sparse-directory exists. (In fact, just calling index_name_pos() in that case would lead to expanding the index to a full index.) However, this takes O(log N) time where N is the number of cache entries. To keep the performance of this call based mostly on the input string, use index_file_exists() to look for the ancestors of the path. Using the heuristic that a sparse directory is likely to have a small number of parent directories, we start from the bottom and build up. Use a string buffer to allow mutating the path name to terminate after each slash for each hashset test. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- sparse-index.c | 73 ++++++++++++++++++++++++++++++++++++++++++++++++++ sparse-index.h | 13 +++++++++ 2 files changed, 86 insertions(+) diff --git a/sparse-index.c b/sparse-index.c index 95ea17174da..6f21397e2ee 100644 --- a/sparse-index.c +++ b/sparse-index.c @@ -283,3 +283,76 @@ void ensure_full_index(struct index_state *istate) trace2_region_leave("index", "ensure_full_index", istate->repo); } + +/* + * This static global helps avoid infinite recursion between + * expand_to_path() and index_file_exists(). + */ +static int in_expand_to_path = 0; + +void expand_to_path(struct index_state *istate, + const char *path, size_t pathlen, int icase) +{ + struct strbuf path_mutable = STRBUF_INIT; + size_t substr_len; + + /* prevent extra recursion */ + if (in_expand_to_path) + return; + + if (!istate || !istate->sparse_index) + return; + + if (!istate->repo) + istate->repo = the_repository; + + in_expand_to_path = 1; + + /* + * We only need to actually expand a region if the + * following are both true: + * + * 1. 'path' is not already in the index. + * 2. Some parent directory of 'path' is a sparse directory. + */ + + if (index_file_exists(istate, path, pathlen, icase)) + goto cleanup; + + strbuf_add(&path_mutable, path, pathlen); + strbuf_addch(&path_mutable, '/'); + + /* Check the name hash for all parent directories */ + substr_len = 0; + while (substr_len < pathlen) { + char temp; + char *replace = strchr(path_mutable.buf + substr_len, '/'); + + if (!replace) + break; + + /* replace the character _after_ the slash */ + replace++; + temp = *replace; + *replace = '\0'; + if (index_file_exists(istate, path_mutable.buf, + path_mutable.len, icase)) { + /* + * We found a parent directory in the name-hash + * hashtable, because only sparse directory entries + * have a trailing '/' character. Since "path" wasn't + * in the index, perhaps it exists within this + * sparse-directory. Expand accordingly. + */ + ensure_full_index(istate); + break; + } + + *replace = temp; + substr_len = replace - path_mutable.buf; + } + +cleanup: + strbuf_release(&path_mutable); + in_expand_to_path = 0; +} diff --git a/sparse-index.h b/sparse-index.h index 0268f38753c..1115a0d7dd9 100644 --- a/sparse-index.h +++ b/sparse-index.h @@ -4,6 +4,19 @@ struct index_state; int convert_to_sparse(struct index_state *istate); +/* + * Some places in the codebase expect to search for a specific path. + * This path might be outside of the sparse-checkout definition, in + * which case a sparse-index may not contain a path for that index. + * + * Given an index and a path, check to see if a leading directory for + * 'path' exists in the index as a sparse directory. In that case, + * expand that sparse directory to a full range of cache entries and + * populate the index accordingly. + */ +void expand_to_path(struct index_state *istate, + const char *path, size_t pathlen, int icase); + struct repository; int set_sparse_index_config(struct repository *repo, int enable); From 4589bca829a2ace58bc98876cccd7dbd2e89f732 Mon Sep 17 00:00:00 2001 From: Derrick Stolee Date: Mon, 12 Apr 2021 21:08:17 +0000 Subject: [PATCH 26/26] name-hash: use expand_to_path() A sparse-index loads the name-hash data for its entries, including the sparse-directory entries. If a caller asks for a path that is contained within a sparse-directory entry, we need to expand to a full index and recalculate the name hash table before returning the result. Insert calls to expand_to_path() to protect against this case. Signed-off-by: Derrick Stolee Reviewed-by: Elijah Newren Signed-off-by: Junio C Hamano --- name-hash.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/name-hash.c b/name-hash.c index d08deaa2c9e..383cf589969 100644 --- a/name-hash.c +++ b/name-hash.c @@ -8,6 +8,7 @@ #include "cache.h" #include "thread-utils.h" #include "trace2.h" +#include "sparse-index.h" struct dir_entry { struct hashmap_entry ent; @@ -683,6 +684,7 @@ int index_dir_exists(struct index_state *istate, const char *name, int namelen) struct dir_entry *dir; lazy_init_name_hash(istate); + expand_to_path(istate, name, namelen, 0); dir = find_dir_entry(istate, name, namelen); return dir && dir->nr; } @@ -693,6 +695,7 @@ void adjust_dirname_case(struct index_state *istate, char *name) const char *ptr = startPtr; lazy_init_name_hash(istate); + expand_to_path(istate, name, strlen(name), 0); while (*ptr) { while (*ptr && *ptr != '/') ptr++; @@ -716,6 +719,7 @@ struct cache_entry *index_file_exists(struct index_state *istate, const char *na unsigned int hash = memihash(name, namelen); lazy_init_name_hash(istate); + expand_to_path(istate, name, namelen, icase); ce = hashmap_get_entry_from_hash(&istate->name_hash, hash, NULL, struct cache_entry, ent);