1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-06 22:46:14 +02:00

diffcore-rename: determine which relevant_sources are no longer relevant

As noted a few commits ago ("diffcore-rename: only compute
dir_rename_count for relevant directories"), when a source file rename
is used as part of directory rename detection, we need to increment
counts for each ancestor directory in dirs_removed with value
RELEVANT_FOR_SELF.  However, a few commits ago ("diffcore-rename: check
if we have enough renames for directories early on"), we may have
downgraded all relevant ancestor directories from RELEVANT_FOR_SELF to
RELEVANT_FOR_ANCESTOR.

For a given file, if no ancestor directory is found in dirs_removed with
a value of RELEVANT_FOR_SELF, then we can downgrade
relevant_source[PATH] from RELEVANT_LOCATION to RELEVANT_NO_MORE.  This
means we can skip detecting a rename for that particular path (and any
other paths in the same directory).

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:        5.680 s ±  0.096 s     5.665 s ±  0.129 s
    mega-renames:     13.812 s ±  0.162 s    11.435 s ±  0.158 s
    just-one-mega:   506.0  ms ±  3.9  ms   494.2  ms ±  6.1  ms

While this improvement looks rather modest for these testcases (because
all the previous optimizations were sufficient to nearly remove all time
spent in rename detection already),  consider this alternative testcase
tweaked from the ones in commit 557ac0350d as follows

    <Same initial setup as commit 557ac0350d, then...>
    $ git switch -c add-empty-file v5.5
    $ >drivers/gpu/drm/i915/new-empty-file
    $ git add drivers/gpu/drm/i915/new-empty-file
    $ git commit -m "new file"
    $ git switch 5.4-rename
    $ git cherry-pick --strategy=ort add-empty-file

For this testcase, we see the following improvement:

                            Before                  After
    pick-empty:        1.936 s ±  0.024 s     688.1 ms ±  4.2 ms

So roughly a factor of 3 speedup.  At $DAYJOB, there was a particular
repository and cherry-pick that inspired this optimization; for that
case I saw a speedup factor of 7 with this optimization.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Elijah Newren 2021-03-13 22:22:08 +00:00 committed by Junio C Hamano
parent ec59da6015
commit 9bd342137e

View File

@ -1129,7 +1129,7 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
* a majority.
*/
int i;
int i, new_num_src;
struct hashmap_iter iter;
struct strmap_entry *entry;
@ -1193,6 +1193,55 @@ static void handle_early_known_dir_renames(struct dir_rename_info *info,
RELEVANT_FOR_ANCESTOR);
}
}
for (i = 0, new_num_src = 0; i < rename_src_nr; i++) {
struct diff_filespec *one = rename_src[i].p->one;
int val;
val = strintmap_get(relevant_sources, one->path);
/*
* sources that were not found in relevant_sources should
* have already been removed by a prior call to
* remove_unneeded_paths_from_src()
*/
assert(val != -1);
if (val == RELEVANT_LOCATION) {
int removable = 1;
char *dir = get_dirname(one->path);
while (1) {
char *freeme = dir;
int res = strintmap_get(dirs_removed, dir);
/* Quit if not found or irrelevant */
if (res == NOT_RELEVANT)
break;
/* If RELEVANT_FOR_SELF, can't remove */
if (res == RELEVANT_FOR_SELF) {
removable = 0;
break;
}
/* Else continue searching upwards */
assert(res == RELEVANT_FOR_ANCESTOR);
dir = get_dirname(dir);
free(freeme);
}
free(dir);
if (removable) {
strintmap_set(relevant_sources, one->path,
RELEVANT_NO_MORE);
continue;
}
}
if (new_num_src < i)
memcpy(&rename_src[new_num_src], &rename_src[i],
sizeof(struct diff_rename_src));
new_num_src++;
}
rename_src_nr = new_num_src;
}
void diffcore_rename_extended(struct diff_options *options,