1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-26 17:36:28 +02:00

diffcore-rename: only compute dir_rename_count for relevant directories

When one side adds files to a directory that the other side renamed,
directory rename detection is used to either move the new paths to the
newer directory or warn the user about the fact that another path
location might be better.

If a parent of the given directory had new files added to it, any
renames in the current directory are also part of determining where the
parent directory is renamed to.  Thus, naively, we need to record each
rename N times for a path at depth N.  However, we can use the
additional information added to dirs_removed in the last commit to avoid
traversing all N parent directories in many cases.  Let's use an example
to explain how this works.  If we have a path named
   src/old_dir/a/b/file.c
and src/old_dir doesn't exist on one side of history, but the other
added a file named src/old_dir/newfile.c, then if one side renamed
   src/old_dir/a/b/file.c => source/new_dir/a/b/file.c
then this file would affect potential directory rename detection counts
for
   src/old_dir/a/b => source/new_dir/a/b
   src/old_dir/a   => source/new_dir/a
   src/old_dir     => source/new_dir
   src             => source
adding a weight of 1 to each in dir_rename_counts.  However, if src/
exists on both sides of history, then we don't need to track any entries
for it in dir_rename_counts.  That was implemented previously.  What we
are adding now, is that if no new files were added to src/old_dir/a or
src/old_dir/b, then we don't need to have counts in dir_rename_count
for those directories either.

In short, we only need to track counts in dir_rename_count for
directories whose dirs_removed value is RELEVANT_FOR_SELF.  And as soon
as we reach a directory that isn't in dirs_removed (signalled by
returning the default value of NOT_RELEVANT from strintmap_get()), we
can stop looking any further up the directory hierarchy.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
Elijah Newren 2021-03-13 22:22:04 +00:00 committed by Junio C Hamano
parent fb52938eec
commit e54385b97a

View File

@ -461,6 +461,8 @@ static void update_dir_rename_counts(struct dir_rename_info *info,
return;
while (1) {
int drd_flag = NOT_RELEVANT;
/* Get old_dir, skip if its directory isn't relevant. */
dirname_munge(old_dir);
if (info->relevant_source_dirs &&
@ -509,16 +511,31 @@ static void update_dir_rename_counts(struct dir_rename_info *info,
}
}
if (strintmap_contains(dirs_removed, old_dir))
/*
* Above we suggested that we'd keep recording renames for
* all ancestor directories where the trailing directories
* matched, i.e. for
* "a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c"
* we'd increment rename counts for each of
* a/b/c/d/e/ => a/b/some/thing/else/e/
* a/b/c/d/ => a/b/some/thing/else/
* However, we only need the rename counts for directories
* in dirs_removed whose value is RELEVANT_FOR_SELF.
* However, we add one special case of also recording it for
* first_time_in_loop because find_basename_matches() can
* use that as a hint to find a good pairing.
*/
if (dirs_removed)
drd_flag = strintmap_get(dirs_removed, old_dir);
if (drd_flag == RELEVANT_FOR_SELF || first_time_in_loop)
increment_count(info, old_dir, new_dir);
else
break;
first_time_in_loop = 0;
if (drd_flag == NOT_RELEVANT)
break;
/* If we hit toplevel directory ("") for old or new dir, quit */
if (!*old_dir || !*new_dir)
break;
first_time_in_loop = 0;
}
/* Free resources we don't need anymore */