1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-06-08 01:56:14 +02:00
git/revision.h

318 lines
8.0 KiB
C
Raw Normal View History

#ifndef REVISION_H
#define REVISION_H
#include "parse-options.h"
#include "grep.h"
#include "notes.h"
toposort: rename "lifo" field The primary invariant of sort_in_topological_order() is that a parent commit is not emitted until all children of it are. When traversing a forked history like this with "git log C E": A----B----C \ D----E we ensure that A is emitted after all of B, C, D, and E are done, B has to wait until C is done, and D has to wait until E is done. In some applications, however, we would further want to control how these child commits B, C, D and E on two parallel ancestry chains are shown. Most of the time, we would want to see C and B emitted together, and then E and D, and finally A (i.e. the --topo-order output). The "lifo" parameter of the sort_in_topological_order() function is used to control this behaviour. We start the traversal by knowing two commits, C and E. While keeping in mind that we also need to inspect E later, we pick C first to inspect, and we notice and record that B needs to be inspected. By structuring the "work to be done" set as a LIFO stack, we ensure that B is inspected next, before other in-flight commits we had known that we will need to inspect, e.g. E. When showing in --date-order, we would want to see commits ordered by timestamps, i.e. show C, E, B and D in this order before showing A, possibly mixing commits from two parallel histories together. When "lifo" parameter is set to false, the function keeps the "work to be done" set sorted in the date order to realize this semantics. After inspecting C, we add B to the "work to be done" set, but the next commit we inspect from the set is E which is newer than B. The name "lifo", however, is too strongly tied to the way how the function implements its behaviour, and does not describe what the behaviour _means_. Replace this field with an enum rev_sort_order, with two possible values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and update the existing code. The mechanical replacement rule is: "lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE" "lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER" Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-07 01:07:14 +02:00
#include "commit.h"
#include "diff.h"
/* Remember to update object flag allocation in object.h */
#define SEEN (1u<<0)
#define UNINTERESTING (1u<<1)
#define TREESAME (1u<<2)
#define SHOWN (1u<<3)
#define TMP_MARK (1u<<4) /* for isolated cases; clean after use */
#define BOUNDARY (1u<<5)
#define CHILD_SHOWN (1u<<6)
#define ADDED (1u<<7) /* Parents already parsed and added? */
#define SYMMETRIC_LEFT (1u<<8)
#define PATCHSAME (1u<<9)
#define BOTTOM (1u<<10)
#define TRACK_LINEAR (1u<<26)
#define ALL_REV_FLAGS (((1u<<11)-1) | TRACK_LINEAR)
#define DECORATE_SHORT_REFS 1
#define DECORATE_FULL_REFS 2
struct rev_info;
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 20:59:32 +02:00
struct log_info;
struct string_list;
log: use true parents for diff even when rewriting When using pathspec filtering in combination with diff-based log output, parent simplification happens before the diff is computed. The diff is therefore against the *simplified* parents. This works okay, arguably by accident, in the normal case: simplification reduces to one parent as long as the commit is TREESAME to it. So the simplified parent of any given commit must have the same tree contents on the filtered paths as its true (unfiltered) parent. However, --full-diff breaks this guarantee, and indeed gives pretty spectacular results when comparing the output of git log --graph --stat ... git log --graph --full-diff --stat ... (--graph internally kicks in parent simplification, much like --parents). To fix it, store a copy of the parent list before simplification (in a slab) whenever --full-diff is in effect. Then use the stored parents instead of the simplified ones in the commit display code paths. The latter do not actually check for --full-diff to avoid duplicated code; they just grab the original parents if save_parents() has not been called for this revision walk. For ordinary commits it should be obvious that this is the right thing to do. Merge commits are a bit subtle. Observe that with default simplification, merge simplification is an all-or-nothing decision: either the merge is TREESAME to one parent and disappears, or it is different from all parents and the parent list remains intact. Redundant parents are not pruned, so the existing code also shows them as a merge. So if we do show a merge commit, the parent list just consists of the rewrite result on each parent. Running, e.g., --cc on this in --full-diff mode is not very useful: if any commits were skipped, some hunks will disagree with all sides of the merge (with one side, because commits were skipped; with the others, because they didn't have those changes in the first place). This triggers --cc showing these hunks spuriously. Therefore I believe that even for merge commits it is better to show the diffs wrt. the original parents. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 22:13:20 +02:00
struct saved_parents;
revision: keep track of the end-user input from the command line Given a complex set of revision specifiers on the command line, it is too late to look at the flags of the objects in the initial traversal list at the beginning of limit_list() in order to determine what the objects the end-user explicitly listed on the command line were. The process to move objects from the pending array to the traversal list may have marked objects that are not mentioned as UNINTERESTING, when handle_commit() marked the parents of UNINTERESTING commits mentioned on the command line by calling mark_parents_uninteresting(). This made "rev-list --ancestry-path ^A ..." to mistakenly list commits that are descendants of A's parents but that are not descendants of A itself, as ^A from the command line causes A and its parents marked as UNINTERESTING before coming to limit_list(), and we try to enumerate the commits that are descendants of these commits that are UNINTERESTING before we start walking the history. It actually is too late even if we inspected the pending object array before calling prepare_revision_walk(), as some of the same objects might have been mentioned twice, once as positive and another time as negative. The "rev-list --some-option A --not --all" command may want to notice, even if the resulting set is empty, that the user showed some interest in "A" and do something special about it. Prepare a separate array to keep track of what syntactic element was used to cause each object to appear in the pending array from the command line, and populate it as setup_revisions() parses the command line. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-26 02:35:39 +02:00
struct rev_cmdline_info {
unsigned int nr;
unsigned int alloc;
struct rev_cmdline_entry {
struct object *item;
const char *name;
enum {
REV_CMD_REF,
REV_CMD_PARENTS_ONLY,
REV_CMD_LEFT,
REV_CMD_RIGHT,
REV_CMD_MERGE_BASE,
revision: keep track of the end-user input from the command line Given a complex set of revision specifiers on the command line, it is too late to look at the flags of the objects in the initial traversal list at the beginning of limit_list() in order to determine what the objects the end-user explicitly listed on the command line were. The process to move objects from the pending array to the traversal list may have marked objects that are not mentioned as UNINTERESTING, when handle_commit() marked the parents of UNINTERESTING commits mentioned on the command line by calling mark_parents_uninteresting(). This made "rev-list --ancestry-path ^A ..." to mistakenly list commits that are descendants of A's parents but that are not descendants of A itself, as ^A from the command line causes A and its parents marked as UNINTERESTING before coming to limit_list(), and we try to enumerate the commits that are descendants of these commits that are UNINTERESTING before we start walking the history. It actually is too late even if we inspected the pending object array before calling prepare_revision_walk(), as some of the same objects might have been mentioned twice, once as positive and another time as negative. The "rev-list --some-option A --not --all" command may want to notice, even if the resulting set is empty, that the user showed some interest in "A" and do something special about it. Prepare a separate array to keep track of what syntactic element was used to cause each object to appear in the pending array from the command line, and populate it as setup_revisions() parses the command line. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-26 02:35:39 +02:00
REV_CMD_REV
} whence;
unsigned flags;
} *rev;
};
#define REVISION_WALK_WALK 0
#define REVISION_WALK_NO_WALK_SORTED 1
#define REVISION_WALK_NO_WALK_UNSORTED 2
struct rev_info {
/* Starting list */
struct commit_list *commits;
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 02:42:35 +02:00
struct object_array pending;
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-05 22:10:06 +01:00
/* Parents of shown commits */
struct object_array boundary_commits;
revision: keep track of the end-user input from the command line Given a complex set of revision specifiers on the command line, it is too late to look at the flags of the objects in the initial traversal list at the beginning of limit_list() in order to determine what the objects the end-user explicitly listed on the command line were. The process to move objects from the pending array to the traversal list may have marked objects that are not mentioned as UNINTERESTING, when handle_commit() marked the parents of UNINTERESTING commits mentioned on the command line by calling mark_parents_uninteresting(). This made "rev-list --ancestry-path ^A ..." to mistakenly list commits that are descendants of A's parents but that are not descendants of A itself, as ^A from the command line causes A and its parents marked as UNINTERESTING before coming to limit_list(), and we try to enumerate the commits that are descendants of these commits that are UNINTERESTING before we start walking the history. It actually is too late even if we inspected the pending object array before calling prepare_revision_walk(), as some of the same objects might have been mentioned twice, once as positive and another time as negative. The "rev-list --some-option A --not --all" command may want to notice, even if the resulting set is empty, that the user showed some interest in "A" and do something special about it. Prepare a separate array to keep track of what syntactic element was used to cause each object to appear in the pending array from the command line, and populate it as setup_revisions() parses the command line. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-08-26 02:35:39 +02:00
/* The end-points specified by the end user */
struct rev_cmdline_info cmdline;
/* excluding from --branches, --refs, etc. expansion */
struct string_list *ref_excludes;
/* Basic information */
const char *prefix;
const char *def;
struct pathspec prune_data;
toposort: rename "lifo" field The primary invariant of sort_in_topological_order() is that a parent commit is not emitted until all children of it are. When traversing a forked history like this with "git log C E": A----B----C \ D----E we ensure that A is emitted after all of B, C, D, and E are done, B has to wait until C is done, and D has to wait until E is done. In some applications, however, we would further want to control how these child commits B, C, D and E on two parallel ancestry chains are shown. Most of the time, we would want to see C and B emitted together, and then E and D, and finally A (i.e. the --topo-order output). The "lifo" parameter of the sort_in_topological_order() function is used to control this behaviour. We start the traversal by knowing two commits, C and E. While keeping in mind that we also need to inspect E later, we pick C first to inspect, and we notice and record that B needs to be inspected. By structuring the "work to be done" set as a LIFO stack, we ensure that B is inspected next, before other in-flight commits we had known that we will need to inspect, e.g. E. When showing in --date-order, we would want to see commits ordered by timestamps, i.e. show C, E, B and D in this order before showing A, possibly mixing commits from two parallel histories together. When "lifo" parameter is set to false, the function keeps the "work to be done" set sorted in the date order to realize this semantics. After inspecting C, we add B to the "work to be done" set, but the next commit we inspect from the set is E which is newer than B. The name "lifo", however, is too strongly tied to the way how the function implements its behaviour, and does not describe what the behaviour _means_. Replace this field with an enum rev_sort_order, with two possible values: REV_SORT_IN_GRAPH_ORDER and REV_SORT_BY_COMMIT_DATE, and update the existing code. The mechanical replacement rule is: "lifo == 0" is equivalent to "sort_order == REV_SORT_BY_COMMIT_DATE" "lifo == 1" is equivalent to "sort_order == REV_SORT_IN_GRAPH_ORDER" Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-06-07 01:07:14 +02:00
/* topo-sort */
enum rev_sort_order sort_order;
unsigned int early_output:1,
add `ignore_missing_links` mode to revwalk When pack-objects is computing the reachability bitmap to serve a fetch request, it can erroneously die() if some of the UNINTERESTING objects are not present. Upload-pack throws away HAVE lines from the client for objects we do not have, but we may have a tip object without all of its ancestors (e.g., if the tip is no longer reachable and was new enough to survive a `git prune`, but some of its reachable objects did get pruned). In the non-bitmap case, we do a revision walk with the HAVE objects marked as UNINTERESTING. The revision walker explicitly ignores errors in accessing UNINTERESTING commits to handle this case (and we do not bother looking at UNINTERESTING trees or blobs at all). When we have bitmaps, however, the process is quite different. The bitmap index for a pack-objects run is calculated in two separate steps: First, we perform an extensive walk from all the HAVEs to find the full set of objects reachable from them. This walk is usually optimized away because we are expected to hit an object with a bitmap during the traversal, which allows us to terminate early. Secondly, we perform an extensive walk from all the WANTs, which usually also terminates early because we hit a commit with an existing bitmap. Once we have the resulting bitmaps from the two walks, we AND-NOT them together to obtain the resulting set of objects we need to pack. When we are walking the HAVE objects, the revision walker does not know that we are walking it only to mark the results as uninteresting. We strip out the UNINTERESTING flag, because those objects _are_ interesting to us during the first walk. We want to keep going to get a complete set of reachable objects if we can. We need some way to tell the revision walker that it's OK to silently truncate the HAVE walk, just like it does for the UNINTERESTING case. This patch introduces a new `ignore_missing_links` flag to the `rev_info` struct, which we set only for the HAVE walk. It also adds tests to cover UNINTERESTING objects missing from several positions: a missing blob, a missing tree, and a missing parent commit. The missing blob already worked (as we do not care about its contents at all), but the other two cases caused us to die(). Note that there are a few cases we do not need to test: 1. We do not need to test a missing tree, with the blob still present. Without the tree that refers to it, we would not know that the blob is relevant to our walk. 2. We do not need to test a tip commit that is missing. Upload-pack omits these for us (and in fact, we complain even in the non-bitmap case if it fails to do so). Reported-by: Siddharth Agarwal <sid0@fb.com> Signed-off-by: Vicent Marti <tanoku@gmail.com> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-28 11:00:43 +01:00
ignore_missing:1,
ignore_missing_links:1;
/* Traversal flags */
unsigned int dense:1,
prune:1,
no_walk:2,
Add "--show-all" revision walker flag for debugging It's really not very easy to visualize the commit walker, because - on purpose - it obvously doesn't show the uninteresting commits! This adds a "--show-all" flag to the revision walker, which will make it show uninteresting commits too, and they'll have a '^' in front of them (it also fixes a logic error for !verbose_header for boundary commits - we should show the '-' even if left_right isn't shown). A separate patch to gitk to teach it the new '^' was sent to paulus. With the change in place, it actually is interesting even for the cases that git doesn't have any problems with, ie for the kernel you can do: gitk -d --show-all v2.6.24.. and you see just how far down it has to parse things to see it all. The use of "-d" is a good idea, since the date-ordered toposort is much better at showing why it goes deep down (ie the date of some of those commits after 2.6.24 is much older, because they were merged from trees that weren't rebased). So I think this is a useful feature even for non-debugging - just to visualize what git does internally more. When it actually breaks out due to the "everybody_uninteresting()" case, it adds the uninteresting commits (both the one it's looking at now, and the list of pending ones) to the list This way, we really list *all* the commits we've looked at. Because we now end up listing commits we may not even have been parsed at all "show_log" and "show_commit" need to protect against commits that don't have a commit buffer entry. That second part is debatable just how it should work. Maybe we shouldn't show such entries at all (with this patch those entries do get shown, they just don't get any message shown with them). But I think this is a useful case. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-02-09 23:02:07 +01:00
show_all:1,
remove_empty_trees:1,
gitweb.cgi history not shown This does: - add a "rev.simplify_history" flag which defaults to on - it turns it off for "git whatchanged" (which thus now has real semantics outside of "git log") - it adds a command line flag ("--full-history") to turn it off for others (ie you can make "git log" and "gitk" etc get the semantics if you want to. Now, just as an example of _why_ you really really really want to simplify history by default, apply this patch, install it, and try these two command lines: gitk --full-history -- git.c gitk -- git.c and compare the output. So with this, you can also now do git whatchanged -p -- gitweb.cgi git log -p --full-history -- gitweb.cgi and it will show the old history of gitweb.cgi, even though it's not relevant to the _current_ state of the name "gitweb.cgi" NOTE NOTE NOTE! It will still actually simplify away merges that didn't change anything at all into either child. That creates these bogus strange discontinuities if you look at it with "gitk" (look at the --full-history gitk output for git.c, and you'll see a few strange cases). So the whole "--parent" thing ends up somewhat bogus with --full-history because of this, but I'm not sure it's worth even worrying about. I don't think you'd ever want to really use "--full-history" with the graphical representation, I just give it as an example exactly to show _why_ doing so would be insane. I think this is trivial enough and useful enough to be worth merging into the stable branch. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-11 19:57:35 +02:00
simplify_history:1,
topo_order:1,
revision traversal: show full history with merge simplification The --full-history traversal keeps all merges in addition to non-merge commits that touch paths in the given pathspec. This is useful to view both sides of a merge in a topology like this: A---M---o / / ---O---B even when A and B makes identical change to the given paths. The revision traversal without --full-history aims to come up with the simplest history to explain the final state of the tree, and one of the side branches can be pruned away. The behaviour to keep all merges however is inconvenient if neither A nor B touches the paths we are interested in. --full-history reduces the topology to: ---O---M---o in such a case, without removing M. This adds a post processing phase on top of --full-history traversal to remove needless merges from the resulting history. The idea is to compute, for each commit in the "full history" result set, the commit that should replace it in the simplified history. The commit to replace it in the final history is determined as follows: * In any case, we first figure out the replacement commits of parents of the commit we are looking at. The commit we are looking at is rewritten as if the replacement commits of its original parents are its parents. While doing so, we reduce the redundant parents from the rewritten parent list by not just removing the identical ones, but also removing a parent that is an ancestor of another parent. * After the above parent simplification, if the commit is a root commit, an UNINTERESTING commit, a merge commit, or modifies the paths we are interested in, then the replacement commit of the commit is itself. In other words, such a commit is not dropped from the final result. The first point above essentially means that the history is rewritten in the bottom up direction. We can rewrite the parent list of a commit only after we know how all of its parents are rewritten. This means that the processing needs to happen on the full history (i.e. after limit_list()). Signed-off-by: Junio C Hamano <gitster@pobox.com>
2008-07-31 10:17:41 +02:00
simplify_merges:1,
simplify_by_decoration:1,
tag_objects:1,
tree_objects:1,
blob_objects:1,
verify_objects:1,
edge_hint:1,
limited:1,
unpacked:1,
revision walker: Fix --boundary when limited This cleans up the boundary processing in the commit walker. It - rips out the boundary logic from the commit walker. Placing "negative" commits in the revs->commits list was Ok if all we cared about "boundary" was the UNINTERESTING limiting case, but conceptually it was wrong. - makes get_revision_1() function to walk the commits and return the results as if there is no funny postprocessing flags such as --reverse, --skip nor --max-count. - makes get_revision() function the postprocessing phase: If reverse is given, wait for get_revision_1() to give everything that it would normally give, and then reverse it before consuming. If skip is given, skip that many before going further. If max is given, stop when we gave out that many. Now that we are about to return one positive commit, mark the parents of that commit to be potential boundaries before returning, iff we are doing the boundary processing. Return the commit. - After get_revision() finishes giving out all the positive commits, if we are doing the boundary processing, we look at the parents that we marked as potential boundaries earlier, see if they are really boundaries, and give them out. It loses more code than it adds, even when the new gc_boundary() function, which is purely for early optimization, is counted. Note that this patch is purely for eyeballing and discussion only. It breaks git-bundle's verify logic because the logic does not use BOUNDARY_SHOW flag for its internal computation anymore. After we correct it not to attempt to affect the boundary processing by setting the BOUNDARY_SHOW flag, we can remove BOUNDARY_SHOW from revision.h and use that bit assignment for the new CHILD_SHOWN flag. Signed-off-by: Junio C Hamano <junkio@cox.net>
2007-03-05 22:10:06 +01:00
boundary:2,
count:1,
left_right:1,
left_only:1,
right_only:1,
rewrite_parents:1,
print_parents:1,
show_source:1,
show_decorations:1,
reverse:1,
reverse_output_stage:1,
cherry_pick:1,
cherry_mark:1,
bisect:1,
ancestry_path:1,
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 17:47:32 +01:00
first_parent_only:1,
line_level_traverse:1;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
/* Diff flags */
unsigned int diff:1,
full_diff:1,
show_root_diff:1,
no_commit_id:1,
verbose_header:1,
ignore_merges:1,
combine_merges:1,
dense_combined_merges:1,
always_show_header:1;
/* Format info */
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 20:59:32 +02:00
unsigned int shown_one:1,
shown_dashes:1,
show_merge:1,
show_notes:1,
show_notes_given:1,
show_signature:1,
pretty_given:1,
abbrev_commit:1,
abbrev_commit_given:1,
use_terminator:1,
missing_newline:1,
date_mode_explicit:1,
preserve_subject:1;
unsigned int disable_stdin:1;
unsigned int leak_pending:1;
/* --show-linear-break */
unsigned int track_linear:1,
track_first_time:1,
linear:1;
enum date_mode date_mode;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
unsigned int abbrev;
enum cmit_fmt commit_format;
Log message printout cleanups On Sun, 16 Apr 2006, Junio C Hamano wrote: > > In the mid-term, I am hoping we can drop the generate_header() > callchain _and_ the custom code that formats commit log in-core, > found in cmd_log_wc(). Ok, this was nastier than expected, just because the dependencies between the different log-printing stuff were absolutely _everywhere_, but here's a patch that does exactly that. The patch is not very easy to read, and the "--patch-with-stat" thing is still broken (it does not call the "show_log()" thing properly for merges). That's not a new bug. In the new world order it _should_ do something like if (rev->logopt) show_log(rev, rev->logopt, "---\n"); but it doesn't. I haven't looked at the --with-stat logic, so I left it alone. That said, this patch removes more lines than it adds, and in particular, the "cmd_log_wc()" loop is now a very clean: while ((commit = get_revision(rev)) != NULL) { log_tree_commit(rev, commit); free(commit->buffer); commit->buffer = NULL; } so it doesn't get much prettier than this. All the complexity is entirely hidden in log-tree.c, and any code that needs to flush the log literally just needs to do the "if (rev->logopt) show_log(...)" incantation. I had to make the combined_diff() logic take a "struct rev_info" instead of just a "struct diff_options", but that part is pretty clean. This does change "git whatchanged" from using "diff-tree" as the commit descriptor to "commit", and I changed one of the tests to reflect that new reality. Otherwise everything still passes, and my other tests look fine too. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-17 20:59:32 +02:00
struct log_info *loginfo;
int nr, total;
const char *mime_boundary;
const char *patch_suffix;
int numbered_files;
int reroll_count;
char *message_id;
struct ident_split from_ident;
struct string_list *ref_message_ids;
int add_signoff;
const char *extra_headers;
const char *log_reencode;
const char *subject_prefix;
int no_inline;
int show_log_size;
struct string_list *mailmap;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
/* Filter by commit log message */
struct grep_opt grep_filter;
/* Display history graph */
struct git_graph *graph;
/* special limits */
int skip_count;
int max_count;
unsigned long max_age;
unsigned long min_age;
int min_parents;
int max_parents;
int (*include_check)(struct commit *, void *);
void *include_check_data;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
/* diff info for patches and for paths limiting */
struct diff_options diffopt;
Common option parsing for "git log --diff" and friends This basically does a few things that are sadly somewhat interdependent, and nontrivial to split out - get rid of "struct log_tree_opt" The fields in "log_tree_opt" are moved into "struct rev_info", and all users of log_tree_opt are changed to use the rev_info struct instead. - add the parsing for the log_tree_opt arguments to "setup_revision()" - make setup_revision set a flag (revs->diff) if the diff-related arguments were used. This allows "git log" to decide whether it wants to show diffs or not. - make setup_revision() also initialize the diffopt part of rev_info (which we had from before, but we just didn't initialize it) - make setup_revision() do all the "finishing touches" on it all (it will do the proper flag combination logic, and call "diff_setup_done()") Now, that was the easy and straightforward part. The slightly more involved part is that some of the programs that want to use the new-and-improved rev_info parsing don't actually want _commits_, they may want tree'ish arguments instead. That meant that I had to change setup_revision() to parse the arguments not into the "revs->commits" list, but into the "revs->pending_objects" list. Then, when we do "prepare_revision_walk()", we walk that list, and create the sorted commit list from there. This actually cleaned some stuff up, but it's the less obvious part of the patch, and re-organized the "revision.c" logic somewhat. It actually paves the way for splitting argument parsing _entirely_ out of "revision.c", since now the argument parsing really is totally independent of the commit walking: that didn't use to be true, since there was lots of overlap with get_commit_reference() handling etc, now the _only_ overlap is the shared (and trivial) "add_pending_object()" thing. However, I didn't do that file split, just because I wanted the diff itself to be smaller, and show the actual changes more clearly. If this gets accepted, I'll do further cleanups then - that includes the file split, but also using the new infrastructure to do a nicer "git diff" etc. Even in this form, it actually ends up removing more lines than it adds. It's nice to note how simple and straightforward this makes the built-in "git log" command, even though it continues to support all the diff flags too. It doesn't get much simpler that this. I think this is worth merging soonish, because it does allow for future cleanup and even more sharing of code. However, it obviously touches "revision.c", which is subtle. I've tested that it passes all the tests we have, and it passes my "looks sane" detector, but somebody else should also give it a good look-over. [jc: squashed the original and three "oops this too" updates, with another fix-up.] Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-15 01:52:13 +02:00
struct diff_options pruning;
struct reflog_walk_info *reflog_info;
struct decoration children;
struct decoration merge_simplification;
revision.c: Make --full-history consider more merges History simplification previously always treated merges as TREESAME if they were TREESAME to any parent. While this was consistent with the default behaviour, this could be extremely unhelpful when searching detailed history, and could not be overridden. For example, if a merge had ignored a change, as if by "-s ours", then: git log -m -p --full-history -Schange file would successfully locate "change"'s addition but would not locate the merge that resolved against it. Futher, simplify_merges could drop the actual parent that a commit was TREESAME to, leaving it as a normal commit marked TREESAME that isn't actually TREESAME to its remaining parent. Now redefine a commit's TREESAME flag to be true only if a commit is TREESAME to _all_ of its parents. This doesn't affect either the default simplify_history behaviour (because partially TREESAME merges are turned into normal commits), or full-history with parent rewriting (because all merges are output). But it does affect other modes. The clearest difference is that --full-history will show more merges - sufficient to ensure that -m -p --full-history log searches can really explain every change to the file, including those changes' ultimate fate in merges. Also modify simplify_merges to recalculate TREESAME after removing a parent. This is achieved by storing per-parent TREESAME flags on the initial scan, so the combined flag can be easily recomputed. This fixes some t6111 failures, but creates a couple of new ones - we are now showing some merges that don't need to be shown. Signed-off-by: Kevin Bracey <kevin@bracey.fi> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-05-16 17:32:34 +02:00
struct decoration treesame;
/* notes-specific options: which refs to show */
struct display_notes_opt notes_opt;
/* commit counts */
int count_left;
int count_right;
int count_same;
Implement line-history search (git log -L) This is a rewrite of much of Bo's work, mainly in an effort to split it into smaller, easier to understand routines. The algorithm is built around the struct range_set, which encodes a series of line ranges as intervals [a,b). This is used in two contexts: * A set of lines we are tracking (which will change as we dig through history). * To encode diffs, as pairs of ranges. The main routine is range_set_map_across_diff(). It processes the diff between a commit C and some parent P. It determines which diff hunks are relevant to the ranges tracked in C, and computes the new ranges for P. The algorithm is then simply to process history in topological order from newest to oldest, computing ranges and (partial) diffs. At branch points, we need to merge the ranges we are watching. We will find that many commits do not affect the chosen ranges, and mark them TREESAME (in addition to those already filtered by pathspec limiting). Another pass of history simplification then gets rid of such commits. This is wired as an extra filtering pass in the log machinery. This currently only reduces code duplication, but should allow for other simplifications and options to be used. Finally, we hook a diff printer into the output chain. Ideally we would wire directly into the diff logic, to optionally use features like word diff. However, that will require some major reworking of the diff chain, so we completely replace the output with our own diff for now. As this was a GSoC project, and has quite some history by now, many people have helped. In no particular order, thanks go to Jakub Narebski <jnareb@gmail.com> Jens Lehmann <Jens.Lehmann@web.de> Jonathan Nieder <jrnieder@gmail.com> Junio C Hamano <gitster@pobox.com> Ramsay Jones <ramsay@ramsay1.demon.co.uk> Will Palmer <wmpalmer@gmail.com> Apologies to everyone I forgot. Signed-off-by: Bo Yang <struggleyb.nku@gmail.com> Signed-off-by: Thomas Rast <trast@student.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-03-28 17:47:32 +01:00
/* line level range that we are chasing */
struct decoration line_log_data;
log: use true parents for diff even when rewriting When using pathspec filtering in combination with diff-based log output, parent simplification happens before the diff is computed. The diff is therefore against the *simplified* parents. This works okay, arguably by accident, in the normal case: simplification reduces to one parent as long as the commit is TREESAME to it. So the simplified parent of any given commit must have the same tree contents on the filtered paths as its true (unfiltered) parent. However, --full-diff breaks this guarantee, and indeed gives pretty spectacular results when comparing the output of git log --graph --stat ... git log --graph --full-diff --stat ... (--graph internally kicks in parent simplification, much like --parents). To fix it, store a copy of the parent list before simplification (in a slab) whenever --full-diff is in effect. Then use the stored parents instead of the simplified ones in the commit display code paths. The latter do not actually check for --full-diff to avoid duplicated code; they just grab the original parents if save_parents() has not been called for this revision walk. For ordinary commits it should be obvious that this is the right thing to do. Merge commits are a bit subtle. Observe that with default simplification, merge simplification is an all-or-nothing decision: either the merge is TREESAME to one parent and disappears, or it is different from all parents and the parent list remains intact. Redundant parents are not pruned, so the existing code also shows them as a merge. So if we do show a merge commit, the parent list just consists of the rewrite result on each parent. Running, e.g., --cc on this in --full-diff mode is not very useful: if any commits were skipped, some hunks will disagree with all sides of the merge (with one side, because commits were skipped; with the others, because they didn't have those changes in the first place). This triggers --cc showing these hunks spuriously. Therefore I believe that even for merge commits it is better to show the diffs wrt. the original parents. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 22:13:20 +02:00
/* copies of the parent lists, for --full-diff display */
struct saved_parents *saved_parents_slab;
struct commit_list *previous_parents;
const char *break_bar;
};
extern int ref_excluded(struct string_list *, const char *path);
void clear_ref_exclusion(struct string_list **);
void add_ref_exclusion(struct string_list **, const char *exclude);
#define REV_TREE_SAME 0
#define REV_TREE_NEW 1 /* Only new files */
#define REV_TREE_OLD 2 /* Only files removed */
#define REV_TREE_DIFFERENT 3 /* Mixed changes */
/* revision.c */
typedef void (*show_early_output_fn_t)(struct rev_info *, struct commit_list *);
extern volatile show_early_output_fn_t show_early_output;
struct setup_revision_opt {
const char *def;
void (*tweak)(struct rev_info *, struct setup_revision_opt *);
const char *submodule;
int assume_dashdash;
unsigned revarg_opt;
};
extern void init_revisions(struct rev_info *revs, const char *prefix);
extern int setup_revisions(int argc, const char **argv, struct rev_info *revs,
struct setup_revision_opt *);
extern void parse_revision_opt(struct rev_info *revs, struct parse_opt_ctx_t *ctx,
const struct option *options,
const char * const usagestr[]);
#define REVARG_CANNOT_BE_FILENAME 01
#define REVARG_COMMITTISH 02
extern int handle_revision_arg(const char *arg, struct rev_info *revs,
int flags, unsigned revarg_opt);
extern void reset_revision_walk(void);
extern int prepare_revision_walk(struct rev_info *revs);
extern struct commit *get_revision(struct rev_info *revs);
extern char *get_revision_mark(const struct rev_info *revs,
const struct commit *commit);
extern void put_revision_mark(const struct rev_info *revs,
const struct commit *commit);
extern void mark_parents_uninteresting(struct commit *commit);
extern void mark_tree_uninteresting(struct tree *tree);
struct name_path {
struct name_path *up;
int elem_len;
const char *elem;
};
show_object(): push path_name() call further down In particular, pushing the "path_name()" call _into_ the show() function would seem to allow - more clarity into who "owns" the name (ie now when we free the name in the show_object callback, it's because we generated it ourselves by calling path_name()) - not calling path_name() at all, either because we don't care about the name in the first place, or because we are actually happy walking the linked list of "struct name_path *" and the last component. Now, I didn't do that latter optimization, because it would require some more coding, but especially looking at "builtin-pack-objects.c", we really don't even want the whole pathname, we really would be better off with the list of path components. Why? We use that name for two things: - add_preferred_base_object(), which actually _wants_ to traverse the path, and now does it by looking for '/' characters! - for 'name_hash()', which only cares about the last 16 characters of a name, so again, generating the full name seems to be just unnecessary work. Anyway, so I didn't look any closer at those things, but it did convince me that the "show_object()" calling convention was crazy, and we're actually better off doing _less_ in list-objects.c, and giving people access to the internal data structures so that they can decide whether they want to generate a path-name or not. This patch does that, and then for people who did use the name (even if they might do something more clever in the future), it just does the straightforward "name = path_name(path, component); .. free(name);" thing. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-11 03:15:26 +02:00
char *path_name(const struct name_path *path, const char *name);
process_{tree,blob}: show objects without buffering Here's a less trivial thing, and slightly more dubious one. I was looking at that "struct object_array objects", and wondering why we do that. I have honestly totally forgotten. Why not just call the "show()" function as we encounter the objects? Rather than add the objects to the object_array, and then at the very end going through the array and doing a 'show' on all, just do things more incrementally. Now, there are possible downsides to this: - the "buffer using object_array" _can_ in theory result in at least better I-cache usage (two tight loops rather than one more spread out one). I don't think this is a real issue, but in theory.. - this _does_ change the order of the objects printed. Instead of doing a "process_tree(revs, commit->tree, &objects, NULL, "");" in the loop over the commits (which puts all the root trees _first_ in the object list, this patch just adds them to the list of pending objects, and then we'll traverse them in that order (and thus show each root tree object together with the objects we discover under it) I _think_ the new ordering actually makes more sense, but the object ordering is actually a subtle thing when it comes to packing efficiency, so any change in order is going to have implications for packing. Good or bad, I dunno. - There may be some reason why we did it that odd way with the object array, that I have simply forgotten. Anyway, now that we don't buffer up the objects before showing them that may actually result in lower memory usage during that whole traverse_commit_list() phase. This is seriously not very deeply tested. It makes sense to me, it seems to pass all the tests, it looks ok, but... Does anybody remember why we did that "object_array" thing? It used to be an "object_list" a long long time ago, but got changed into the array due to better memory usage patterns (those linked lists of obejcts are horrible from a memory allocation standpoint). But I wonder why we didn't do this back then. Maybe there's a reason for it. Or maybe there _used_ to be a reason, and no longer is. Signed-off-by: Junio C Hamano <gitster@pobox.com>
2009-04-11 02:27:58 +02:00
extern void show_object_with_name(FILE *, struct object *,
const struct name_path *, const char *);
Add "named object array" concept We've had this notion of a "object_list" for a long time, which eventually grew a "name" member because some users (notably git-rev-list) wanted to name each object as it is generated. That object_list is great for some things, but it isn't all that wonderful for others, and the "name" member is generally not used by everybody. This patch splits the users of the object_list array up into two: the traditional list users, who want the list-like format, and who don't actually use or want the name. And another class of users that really used the list as an extensible array, and generally wanted to name the objects. The patch is fairly straightforward, but it's also biggish. Most of it really just cleans things up: switching the revision parsing and listing over to the array makes things like the builtin-diff usage much simpler (we now see exactly how many members the array has, and we don't get the objects reversed from the order they were on the command line). One of the main reasons for doing this at all is that the malloc overhead of the simple object list was actually pretty high, and the array is just a lot denser. So this patch brings down memory usage by git-rev-list by just under 3% (on top of all the other memory use optimizations) on the mozilla archive. It does add more lines than it removes, and more importantly, it adds a whole new infrastructure for maintaining lists of objects, but on the other hand, the new dynamic array code is pretty obvious. The change to builtin-diff-tree.c shows a fairly good example of why an array interface is sometimes more natural, and just much simpler for everybody. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 02:42:35 +02:00
extern void add_object(struct object *obj,
struct object_array *p,
struct name_path *path,
const char *name);
extern void add_pending_object(struct rev_info *revs,
struct object *obj, const char *name);
extern void add_pending_sha1(struct rev_info *revs,
const char *name, const unsigned char *sha1,
unsigned int flags);
extern void add_head_to_pending(struct rev_info *);
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-04 21:12:05 +01:00
enum commit_action {
commit_ignore,
commit_show,
commit_error
};
extern enum commit_action get_commit_action(struct rev_info *revs,
struct commit *commit);
extern enum commit_action simplify_commit(struct rev_info *revs,
struct commit *commit);
Enhance --early-output format This makes --early-output a bit more advanced, and actually makes it generate multiple "Final output:" headers as it updates things asynchronously. I realize that the "Final output:" line is now illogical, since it's not really final until it also says "done", but It now _always_ generates a "Final output:" header in front of any commit list, and that output header gives you a *guess* at the maximum number of commits available. However, it should be noted that the guess can be completely off: I do a reasonable job estimating it, but it is not meant to be exact. So what happens is that you may get output like this: - at 0.1 seconds: Final output: 2 incomplete .. 2 commits listed .. - half a second later: Final output: 33 incomplete .. 33 commits listed .. - another half a second after that: Final output: 71 incomplete .. 71 commits listed .. - another half second later: Final output: 136 incomplete .. 100 commits listed: we hit the --early-output limit, and .. will only output 100 commits, and after this you'll not .. see an "incomplete" report any more since you got as much .. early output as you asked for! - .. and then finally: Final output: 73106 done .. all the commits .. The above is a real-life scenario on my current kernel tree after having flushed all the caches. Tested with the experimental gitk patch that Paul sent out, and by looking at the actual log output (and verifying that my commit count guesses actually match real life fairly well). Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2007-11-04 21:12:05 +01:00
enum rewrite_result {
rewrite_one_ok,
rewrite_one_noparents,
rewrite_one_error
};
typedef enum rewrite_result (*rewrite_parent_fn_t)(struct rev_info *revs, struct commit **pp);
extern int rewrite_parents(struct rev_info *revs, struct commit *commit,
rewrite_parent_fn_t rewrite_parent);
log: use true parents for diff even when rewriting When using pathspec filtering in combination with diff-based log output, parent simplification happens before the diff is computed. The diff is therefore against the *simplified* parents. This works okay, arguably by accident, in the normal case: simplification reduces to one parent as long as the commit is TREESAME to it. So the simplified parent of any given commit must have the same tree contents on the filtered paths as its true (unfiltered) parent. However, --full-diff breaks this guarantee, and indeed gives pretty spectacular results when comparing the output of git log --graph --stat ... git log --graph --full-diff --stat ... (--graph internally kicks in parent simplification, much like --parents). To fix it, store a copy of the parent list before simplification (in a slab) whenever --full-diff is in effect. Then use the stored parents instead of the simplified ones in the commit display code paths. The latter do not actually check for --full-diff to avoid duplicated code; they just grab the original parents if save_parents() has not been called for this revision walk. For ordinary commits it should be obvious that this is the right thing to do. Merge commits are a bit subtle. Observe that with default simplification, merge simplification is an all-or-nothing decision: either the merge is TREESAME to one parent and disappears, or it is different from all parents and the parent list remains intact. Redundant parents are not pruned, so the existing code also shows them as a merge. So if we do show a merge commit, the parent list just consists of the rewrite result on each parent. Running, e.g., --cc on this in --full-diff mode is not very useful: if any commits were skipped, some hunks will disagree with all sides of the merge (with one side, because commits were skipped; with the others, because they didn't have those changes in the first place). This triggers --cc showing these hunks spuriously. Therefore I believe that even for merge commits it is better to show the diffs wrt. the original parents. Reported-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk> Signed-off-by: Thomas Rast <trast@inf.ethz.ch> Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-07-31 22:13:20 +02:00
/*
* Save a copy of the parent list, and return the saved copy. This is
* used by the log machinery to retrieve the original parents when
* commit->parents has been modified by history simpification.
*
* You may only call save_parents() once per commit (this is checked
* for non-root commits).
*
* get_saved_parents() will transparently return commit->parents if
* history simplification is off.
*/
extern void save_parents(struct rev_info *revs, struct commit *commit);
extern struct commit_list *get_saved_parents(struct rev_info *revs, const struct commit *commit);
extern void free_saved_parents(struct rev_info *revs);
#endif