1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-18 01:56:15 +02:00
git/commit-reach.h
Derrick Stolee fd67d149bd commit-reach: implement ahead_behind() logic
Fully implement the commit-counting logic required to determine
ahead/behind counts for a batch of commit pairs. This is a new library
method within commit-reach.h. This method will be linked to the
for-each-ref builtin in the next change.

The interface for ahead_behind() uses two arrays. The first array of
commits contains the list of all starting points for the walk. This
includes all tip commits _and_ base commits. The second array specifies
base/tip pairs by pointing to commits within the first array, by index.
The second array also stores the resulting ahead/behind counts for each
of these pairs.

This implementation of ahead_behind() allows multiple bases, if desired.
Even with multiple bases, there is only one commit walk used for
counting the ahead/behind values, saving time when the base/tip ranges
overlap significantly.

This interface for ahead_behind() also makes it very easy to call
ensure_generations_valid() on the entire array of bases and tips. This
call is necessary because it is critical that the walk that counts
ahead/behind values never walks a commit more than once. Without
generation numbers on every commit, there is a possibility that a
commit date skew could cause the walk to revisit a commit and then
double-count it. For this reason, it is strongly recommended that 'git
ahead-behind' is only run in a repository with a commit-graph file that
covers most of the reachable commits, storing precomputed generation
numbers. If no commit-graph exists, this walk will be much slower as it
must walk all reachable commits in ensure_generations_valid() before
performing the counting logic.

It is possible to detect if generation numbers are available at run time
and redirect the implementation to another algorithm that does not
require this property. However, that implementation requires a commit
walk per base/tip pair _and_ can be slower due to the commit date
heuristics required. Such an implementation could be considered in the
future if there is a reason to include it, but most Git hosts should
already be generating a commit-graph file as part of repository
maintenance. Most Git clients should also be generating commit-graph
files as part of background maintenance or automatic GCs.

Now, let's discuss the ahead/behind counting algorithm.

The first array of commits are considered the starting commits. The
index within that array will play a critical role.

We create a new commit slab that maps commits to a bitmap. For a given
commit (anywhere in the history), its bitmap stores information relative
to which of the input commits can reach that commit. The ith bit will be
on if the ith commit from the starting list can reach that commit. It is
important to notice that these bitmaps are not the typical "reachability
bitmaps" that are stored in .bitmap files. Instead of signalling which
objects are reachable from the current commit, they instead signal
"which starting commits can reach me?" It is also important to know that
the bitmap is not necessarily "complete" until we walk that commit. We
will perform a commit walk by generation number in such a way that we
can guarantee the bitmap is correct when we visit that commit.

At the beginning of the ahead_behind() method, we initialize the bitmaps
for each of the starting commits. By enabling the ith bit for the ith
starting commit, we signal "the ith commit can reach itself."

We walk commits by popping the commit with maximum generation number out
of the queue, guaranteeing that we will never walk a child of that
commit in any future steps.

As we walk, we load the bitmap for the current commit and perform two
main steps. The _second_ step examines each parent of the current commit
and adds the current commit's bitmap bits to each parent's bitmap. (We
create a new bitmap for the parent if this is our first time seeing that
parent.) After adding the bits to the parent's bitmap, the parent is
added to the walk queue. Due to this passing of bits to parents, the
current commit has a guarantee that the ith bit is enabled on its bitmap
if and only if the ith commit can reach the current commit.

The first step of the walk is to examine the bitmask on the current
commit and decide which ranges the commit is in or not. Due to the "bit
pushing" in the second step, we have a guarantee that the ith bit of the
current commit's bitmap is on if and only if the ith starting commit can
reach it. For each ahead_behind_count struct, check the base_index and
tip_index to see if those bits are enabled on the current bitmap. If
exactly one bit is enabled, then increment the corresponding 'ahead' or
'behind' count.  This increment is the reason we _absolutely need_ to
walk commits at most once.

The only subtle thing to do with this walk is to check to see if a
parent has all bits on in its bitmap, in which case it becomes "stale"
and is marked with the STALE bit. This allows queue_has_nonstale() to be
the terminating condition of the walk, which greatly reduces the number
of commits walked if all of the commits are nearby in history. It avoids
walking a large number of common commits when there is a deep history.
We also use the helper method insert_no_dup() to add commits to the
priority queue without adding them multiple times. This uses the PARENT2
flag. Thus, we must clear both the STALE and PARENT2 bits of all
commits, in case ahead_behind() is called multiple times in the same
process.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-20 12:17:33 -07:00

139 lines
4.6 KiB
C

#ifndef COMMIT_REACH_H
#define COMMIT_REACH_H
#include "commit.h"
#include "commit-slab.h"
struct commit_list;
struct ref_filter;
struct object_id;
struct object_array;
struct commit_list *repo_get_merge_bases(struct repository *r,
struct commit *rev1,
struct commit *rev2);
struct commit_list *repo_get_merge_bases_many(struct repository *r,
struct commit *one, int n,
struct commit **twos);
/* To be used only when object flags after this call no longer matter */
struct commit_list *repo_get_merge_bases_many_dirty(struct repository *r,
struct commit *one, int n,
struct commit **twos);
#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
#define get_merge_bases(r1, r2) repo_get_merge_bases(the_repository, r1, r2)
#define get_merge_bases_many(one, n, two) repo_get_merge_bases_many(the_repository, one, n, two)
#define get_merge_bases_many_dirty(one, n, twos) repo_get_merge_bases_many_dirty(the_repository, one, n, twos)
#endif
struct commit_list *get_octopus_merge_bases(struct commit_list *in);
int repo_is_descendant_of(struct repository *r,
struct commit *commit,
struct commit_list *with_commit);
int repo_in_merge_bases(struct repository *r,
struct commit *commit,
struct commit *reference);
int repo_in_merge_bases_many(struct repository *r,
struct commit *commit,
int nr_reference, struct commit **reference);
#ifndef NO_THE_REPOSITORY_COMPATIBILITY_MACROS
#define in_merge_bases(c1, c2) repo_in_merge_bases(the_repository, c1, c2)
#define in_merge_bases_many(c1, n, cs) repo_in_merge_bases_many(the_repository, c1, n, cs)
#endif
/*
* Takes a list of commits and returns a new list where those
* have been removed that can be reached from other commits in
* the list. It is useful for, e.g., reducing the commits
* randomly thrown at the git-merge command and removing
* redundant commits that the user shouldn't have given to it.
*
* This function destroys the STALE bit of the commit objects'
* flags.
*/
struct commit_list *reduce_heads(struct commit_list *heads);
/*
* Like `reduce_heads()`, except it replaces the list. Use this
* instead of `foo = reduce_heads(foo);` to avoid memory leaks.
*/
void reduce_heads_replace(struct commit_list **heads);
int ref_newer(const struct object_id *new_oid, const struct object_id *old_oid);
/*
* Unknown has to be "0" here, because that's the default value for
* contains_cache slab entries that have not yet been assigned.
*/
enum contains_result {
CONTAINS_UNKNOWN = 0,
CONTAINS_NO,
CONTAINS_YES
};
define_commit_slab(contains_cache, enum contains_result);
int commit_contains(struct ref_filter *filter, struct commit *commit,
struct commit_list *list, struct contains_cache *cache);
/*
* Determine if every commit in 'from' can reach at least one commit
* that is marked with 'with_flag'. As we traverse, use 'assign_flag'
* as a marker for commits that are already visited. Do not walk
* commits with date below 'min_commit_date' or generation below
* 'min_generation'.
*/
int can_all_from_reach_with_flag(struct object_array *from,
unsigned int with_flag,
unsigned int assign_flag,
time_t min_commit_date,
timestamp_t min_generation);
int can_all_from_reach(struct commit_list *from, struct commit_list *to,
int commit_date_cutoff);
/*
* Return a list of commits containing the commits in the 'to' array
* that are reachable from at least one commit in the 'from' array.
* Also add the given 'flag' to each of the commits in the returned list.
*
* This method uses the PARENT1 and PARENT2 flags during its operation,
* so be sure these flags are not set before calling the method.
*/
struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
struct commit **to, int nr_to,
unsigned int reachable_flag);
struct ahead_behind_count {
/**
* As input, the *_index members indicate which positions in
* the 'tips' array correspond to the tip and base of this
* comparison.
*/
size_t tip_index;
size_t base_index;
/**
* These values store the computed counts for each side of the
* symmetric difference:
*
* 'ahead' stores the number of commits reachable from the tip
* and not reachable from the base.
*
* 'behind' stores the number of commits reachable from the base
* and not reachable from the tip.
*/
unsigned int ahead;
unsigned int behind;
};
/*
* Given an array of commits and an array of ahead_behind_count pairs,
* compute the ahead/behind counts for each pair.
*/
void ahead_behind(struct repository *r,
struct commit **commits, size_t commits_nr,
struct ahead_behind_count *counts, size_t counts_nr);
#endif