mirror/git - git - git.dotya.ml

mirror/ git

mirror of https://github.com/git/git.git synced 2024-05-22 08:36:09 +02:00

Author	SHA1	Message	Date
Elijah Newren	b0ca120554	commit: move reverse_commit_list() from merge-recursive Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-16 21:56:39 -08:00
Elijah Newren	c5a6f65527	merge-ort: add modify/delete handling and delayed output processing The focus here is on adding a path_msg() which will queue up warning/conflict/notice messages about the merge for later processing, storing these in a pathname -> strbuf map. It might seem like a big change, but it really just is: * declaration of necessary map with some comments * initialization and recording of data * a bunch of code to iterate over the map at print/free time * at least one caller in order to avoid an error about having an unused function (which we provide in the form of implementing modify/delete conflict handling). At this stage, it is probably not clear why I am opting for delayed output processing. There are multiple reasons: 1. Merges are supposed to abort if they would overwrite dirty changes in the working tree. We cannot correctly determine whether changes would be overwritten until both rename detection has occurred and full processing of entries with the renames has finalized. Warning/conflict/notice messages come up at intermediate codepaths along the way, so unless we want spurious conflict/warning messages being printed when the merge will be aborted anyway, we need to save these messages and only print them when relevant. 2. There can be multiple messages for a single path, and we want all messages for a give path to appear together instead of having them grouped by conflict/warning type. This was a problem already with merge-recursive.c but became even more important due to the splitting apart of conflict types as discussed in the commit message for `1f3c9ba707` ("t6425: be more flexible with rename/delete conflict messages", 2020-08-10) 3. Some callers might want to avoid showing the output in certain cases, such as if the end result is a clean merge. Rebases have typically done this. 4. Some callers might not want the output to go to stdout or even stderr, but might want to do something else with it entirely. For example, a --remerge-diff option to `git show` or `git log -p` that remerges on the fly and diffs merge commits against the remerged version would benefit from stdout/stderr not being written to in the standard form. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:38:47 -08:00
Elijah Newren	e2e9dc030c	merge-ort: add die-not-implemented stub handle_content_merge() function This simplistic and weird-looking patch is here to facilitate future patch submissions. Adding this stub allows rename detection code to reference it in one patch series, while a separate patch series can define the implementation, and then both series can merge cleanly and work nicely together at that point. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:38:47 -08:00
Elijah Newren	04af1879b9	merge-ort: add function grouping comments Commit b658536f59 ("merge-ort: add some high-level algorithm structure", 2020-10-27) added high-level structure of the ort merge algorithm. As we have added more and more functions, that high-level structure has been slightly obscured. Since functions are still grouped according to this high-level structure, add comments denoting sections where all the functions are specifically tied to a piece of the high-level structure. This function groupings include a few sub-divisions of the original high-level structure, including some sub-divisions that are yet to be submitted. Each has (or will have) several functions all serving as helpers to one or two main functions for each section. As an added bonus, the comments will serve to provide a small textual separation between nearby sections and allow the next three patch series to be submitted independently and merge cleanly. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:38:47 -08:00
Elijah Newren	43c1dccb91	merge-ort: add a paths_to_free field to merge_options_internal This field will be used in future patches to allow removal of paths from opt->priv->paths. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:38:47 -08:00
Elijah Newren	1c7873cdf4	merge-ort: add a path_conflict field to merge_options_internal This field is not yet used, but will be used by both the rename handling code, and the conflict type handling code in process_entry(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:38:40 -08:00
Elijah Newren	101bc5bc2d	merge-ort: add a clear_internal_opts helper Move most of merge_finalize() into a new helper function, clear_internal_opts(). This is a step to facilitate recursive merges, as well as some future optimizations. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:21:03 -08:00
Elijah Newren	67845745c1	merge-ort: add a few includes Include blob.h for definition of blob_type, and commit-reach.h for declarations of get_merge_bases() and in_merge_bases(). While none of these are used yet, we want to avoid cross-dependencies in the next three series of patches for merge-ort and merge them at the end; adding these "#include"s now avoids textual conflicts. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:21:03 -08:00
Elijah Newren	89422d29b1	merge-ort: free data structures in merge_finalize() Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	ef2b369387	merge-ort: add implementation of record_conflicted_index_entries() After checkout(), the working tree has the appropriate contents, and the index matches the working copy. That means that all unmodified and cleanly merged files have correct index entries, but conflicted entries need to be updated. We do this by looping over the conflicted entries, marking the existing index entry for the path with CE_REMOVE, adding new higher order staged for the path at the end of the index (ignoring normal index sort order), and then at the end of the loop removing the CE_REMOVED-marked cache entries and sorting the index. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	70912f66de	tree: enable cmp_cache_name_compare() to be used elsewhere Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	6681ce5cf6	merge-ort: add implementation of checkout() Since merge-ort creates a tree for its output, when there are no conflicts, updating the working tree and index is as simple as using the unpack_trees() machinery with a twoway_merge (i.e. doing the equivalent of a "checkout" operation). If there were conflicts in the merge, then since the tree we created included all the conflict markers, then using the unpack_trees machinery in this manner will still update the working tree correctly. Further, all index entries corresponding to cleanly merged files will also be updated correctly by this procedure. Index entries corresponding to conflicted entries will appear as though the user had run "git add -u" after the merge to accept all files as-is with conflict markers. Thus, after running unpack_trees(), there needs to be a separate step for updating the entries in the index corresponding to conflicted files. This will be the job for the function record_conflicted_index_entris(), which will be implemented in a subsequent commit. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	9fefce68dc	merge-ort: basic outline for merge_switch_to_result() This adds a basic implementation for merge_switch_to_result(), though just in terms of a few new empty functions that will be defined in subsequent commits. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	bb470f4e13	merge-ort: step 3 of tree writing -- handling subdirectories as we go Our order for processing of entries means that if we have a tree of files that looks like Makefile src/moduleA/foo.c src/moduleA/bar.c src/moduleB/baz.c src/moduleB/umm.c tokens.txt Then we will process paths in the order of the leftmost column below. I have added two additional columns that help explain the algorithm that follows; the 2nd column is there to remind us we have oid & mode info we are tracking for each of these paths (which differs between the paths which I'm not representing well here), and the third column annotates the parent directory of the entry: tokens.txt <version_info> "" src/moduleB/umm.c <version_info> src/moduleB src/moduleB/baz.c <version_info> src/moduleB src/moduleB <version_info> src src/moduleA/foo.c <version_info> src/moduleA src/moduleA/bar.c <version_info> src/moduleA src/moduleA <version_info> src src <version_info> "" Makefile <version_info> "" When the parent directory changes, if it's a subdirectory of the previous parent directory (e.g. "" -> src/moduleB) then we can just keep appending. If the parent directory differs from the previous parent directory and is not a subdirectory, then we should process that directory. So, for example, when we get to this point: tokens.txt <version_info> "" src/moduleB/umm.c <version_info> src/moduleB src/moduleB/baz.c <version_info> src/moduleB and note that the next entry (src/moduleB) has a different parent than the last one that isn't a subdirectory, we should write out a tree for it 100644 blob <HASH> umm.c 100644 blob <HASH> baz.c then pop all the entries under that directory while recording the new hash for that directory, leaving us with tokens.txt <version_info> "" src/moduleB <new version_info> src This process repeats until at the end we get to tokens.txt <version_info> "" src <new version_info> "" Makefile <version_info> "" and then we can write out the toplevel tree. Since we potentially have entries in our string_list corresponding to multiple different toplevel directories, e.g. a slightly different repository might have: whizbang.txt <version_info> "" tokens.txt <version_info> "" src/moduleD <new version_info> src src/moduleC <new version_info> src src/moduleB <new version_info> src src/moduleA/foo.c <version_info> src/moduleA src/moduleA/bar.c <version_info> src/moduleA When src/moduleA is popped off, we need to know that the "last directory" reverts back to src, and how many entries in our string_list are associated with that parent directory. So I use an auxiliary offsets string_list which would have (parent_directory,offset) information of the form "" 0 src 2 src/moduleA 5 Whenever I write out a tree for a subdirectory, I set versions.nr to the final offset value and then decrement offsets.nr...and then add an entry to versions with a hash for the new directory. The idea is relatively simple, there's just a lot of accounting to implement this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	ee4012dcf9	merge-ort: step 2 of tree writing -- function to create tree object Create a new function, write_tree(), which will take a list of basenames, modes, and oids for a single directory and create a tree object in the object-store. We do not yet have just basenames, modes, and oids for just a single directory (we have a mixture of entries from all directory levels in the hierarchy) so we still die() before the current call to write_tree(), but the next patch will rectify that. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	a9945bba60	merge-ort: step 1 of tree writing -- record basenames, modes, and oids As a step towards transforming the processed path->conflict_info entries into an actual tree object, start recording basenames, modes, and oids in a dir_metadata structure. Subsequent commits will make use of this to actually write a tree. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	8adffaa818	merge-ort: have process_entries operate in a defined order We want to handle paths below a directory before needing to handle the directory itself. Also, we want to handle the directory immediately after the paths below it, so we can't use simple lexicographic ordering from strcmp (which would insert foo.txt between foo and foo/file.c). Copy string_list_df_name_compare() from merge-recursive.c, and set up a string list of paths sorted by that function so that we can iterate in the desired order. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	6a02dd90c9	merge-ort: add a preliminary simple process_entries() implementation Add a process_entries() implementation that just loops over the paths and processes each one individually with an auxiliary process_entry() call. Add a basic process_entry() as well, which handles several cases but leaves a few of the more involved ones with die-not-implemented messages. Also, although process_entries() is supposed to create a tree, it does not yet have code to do so -- except in the special case of merging completely empty trees. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	291f29caf6	merge-ort: avoid recursing into identical trees When all three trees have the same oid, there is no need to recurse into these trees to find that all files within them happen to match. We can just record any one of the trees as the resolution of merging that particular path. Immediately resolving trees for other types of trivial tree merges (such as one side matches the merge base, or the two sides match each other) would prevent us from detecting renames for some paths, and thus prevent us from doing three-way content merges for those paths whose renames we did not detect. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	98bf984167	merge-ort: record stage and auxiliary info for every path Create a helper function, setup_path_info(), which can be used to record all the information we want in a merged_info or conflict_info. While there is currently only one caller of this new function, and some of its particular parameters are fixed, future callers of this function will be added later. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	34e557af54	merge-ort: compute a few more useful fields for collect_merge_info Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	885f0063e9	merge-ort: avoid repeating fill_tree_descriptor() on the same tree Three-way merges, by their nature, are going to often have two or more trees match at a given subdirectory. We can avoid calling fill_tree_descriptor() on the same tree by checking when these trees match. Noting when various oids match will also be useful in other calculations and optimizations as well. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:20 -08:00
Elijah Newren	d2bc1994f3	merge-ort: implement a very basic collect_merge_info() This does not actually collect any necessary info other than the pathnames involved, since it just allocates an all-zero conflict_info and stuffs that into paths. However, it invokes the traverse_trees() machinery to walk over all the paths and sets up the basic infrastructure we need. I have left out a few obvious optimizations to try to make this patch as short and obvious as possible. A subsequent patch will add some of those back in with some more useful data fields before we introduce a patch that actually sets up the conflict_info fields. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Elijah Newren	0c0d705b5c	merge-ort: add an err() function similar to one from merge-recursive Various places in merge-recursive used an err() function when it hit some kind of unrecoverable error. That code was from the reusable bits of merge-recursive.c that we liked, such as merge_3way, writing object files to the object store, reading blobs from the object store, etc. So create a similar function to allow us to port that code over, and use it for when we detect problems returned from collect_merge_info()'s traverse_trees() call, which we will be adding next. While we are at it, also add more documentation for the "clean" field from struct merge_result, particularly since the name suggests a boolean but it is not quite one and this is our first non-boolean usage. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Elijah Newren	c8017176ac	merge-ort: use histogram diff In my cursory investigation, histogram diffs are about 2% slower than Myers diffs. Others have probably done more detailed benchmarks. But, in short, histogram diffs have been around for years and in a number of cases provide obviously better looking diffs where Myers diffs are unintelligible but the performance hit has kept them from becoming the default. However, there are real merge bugs we know about that have triggered on git.git and linux.git, which I don't have a clue how to address without the additional information that I believe is provided by histogram diffs. See the following: https://lore.kernel.org/git/20190816184051.GB13894@sigill.intra.peff.net/ https://lore.kernel.org/git/CABPp-BHvJHpSJT7sdFwfNcPn_sOXwJi3=o14qjZS3M8Rzcxe2A@mail.gmail.com/ https://lore.kernel.org/git/CABPp-BGtez4qjbtFT1hQoREfcJPmk9MzjhY5eEq1QhXT23tFOw@mail.gmail.com/ I don't like mismerges. I really don't like silent mismerges. While I am sometimes willing to make performance and correctness tradeoff, I'm much more interested in correctness in general. I want to fix the above bugs. I have not yet started doing so, but I believe histogram diff at least gives me an angle. Unfortunately, I can't rely on using the information from histogram diff unless it's in use. And it hasn't been used because of a few percentage performance hit. In testcases I have looked at, merge-ort is _much_ faster than merge-recursive for non-trivial merges/rebases/cherry-picks. As such, this is a golden opportunity to switch out the underlying diff algorithm (at least the one used by the merge machinery; git-diff and git-log are separate questions); doing so will allow me to get additional data and improved diffs, and I believe it will help me fix the above bugs at some point in the future. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Elijah Newren	e4171b1b6d	merge-ort: port merge_start() from merge-recursive merge_start() basically does a bunch of sanity checks, then allocates and initializes opt->priv -- a struct merge_options_internal. Most of the sanity checks are usable as-is. The allocation/intialization is a bit different since merge-ort has a very different merge_options_internal than merge-recursive, but the idea is the same. The weirdest part here is that merge-ort and merge-recursive use the same struct merge_options, even though merge_options has a number of fields that are oddly specific to merge-recursive's internal implementation and don't even make sense with merge-ort's high-level design (e.g. buffer_output, which merge-ort has to always do). I reused the same data structure because: * most the fields made sense to both merge algorithms * making a new struct would have required making new enums or somehow externalizing them, and that was getting messy. * it simplifies converting the existing callers by not having to have different code paths for merge_options setup. I also marked detect_renames as ignored. We can revisit that later, but in short: merge-recursive allowed turning off rename detection because it was sometimes glacially slow. When you speed something up by a few orders of magnitude, it's worth revisiting whether that justification is still relevant. Besides, if folks find it's still too slow, perhaps they have a better scaling case than I could find and maybe it turns up some more optimizations we can add. If it still is needed as an option, it is easy to add later. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Elijah Newren	231e2dd49d	merge-ort: add some high-level algorithm structure merge_ort_nonrecursive_internal() will be used by both merge_inmemory_nonrecursive() and merge_inmemory_recursive(); let's focus on it for now. It involves some setup -- merge_start() -- followed by the following chain of functions: collect_merge_info() This function will populate merge_options_internal's paths field, via a call to traverse_trees() and a new callback that will be added later. detect_and_process_renames() This function will detect renames, and then adjust entries in paths to move conflict stages from old pathnames into those for new pathnames, so that the next step doesn't have to think about renames and just can do three-way content merging and such. process_entries() This function determines how to take the various stages (versions of a file from the three different sides) and merge them, and whether to mark the result as conflicted or cleanly merged. It also writes out these merged file versions as it goes to create a tree. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Elijah Newren	5b59c3db05	merge-ort: setup basic internal data structures Set up some basic internal data structures. The only carry-over from merge-recursive.c is call_depth, though needed_rename_limit will be added later. The central piece of data will definitely be the strmap "paths", which will map every relevant pathname under consideration to either a merged_info or a conflict_info. ("conflicted" is a strmap that is a subset of "paths".) merged_info contains all relevant information for a non-conflicted entry. conflict_info contains a merged_info, plus any additional information about a conflict such as the higher orders stages involved and the names of the paths those came from (handy once renames get involved). If an entry remains conflicted, the merged_info portion of a conflict_info will later be filled with whatever version of the file should be placed in the working directory (e.g. an as-merged-as-possible variation that contains conflict markers). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-12-13 14:18:19 -08:00
Junio C Hamano	6d37ca2165	Merge branch 'en/strmap' into en/merge-ort-impl * en/strmap: shortlog: use strset from strmap.h Use new HASHMAP_INIT macro to simplify hashmap initialization strmap: take advantage of FLEXPTR_ALLOC_STR when relevant strmap: enable allocations to come from a mem_pool strmap: add a strset sub-type strmap: split create_entry() out of strmap_put() strmap: add functions facilitating use as a string->int map strmap: enable faster clearing and reusing of strmaps strmap: add more utility functions strmap: new utility functions hashmap: provide deallocation function names hashmap: introduce a new hashmap_partial_clear() hashmap: allow re-use after hashmap_free() hashmap: adjust spacing to fix argument alignment hashmap: add usage documentation explaining hashmap_free[_entries]()	2020-11-11 12:56:29 -08:00
Elijah Newren	449a900969	shortlog: use strset from strmap.h Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-11 12:55:27 -08:00
Elijah Newren	b19315d8ab	Use new HASHMAP_INIT macro to simplify hashmap initialization Now that hashamp has lazy initialization and a HASHMAP_INIT macro, hashmaps allocated on the stack can be initialized without a call to hashmap_init() and in some cases makes the code a bit shorter. Convert some callsites over to take advantage of this. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-11 12:55:27 -08:00
Elijah Newren	23a276a9c4	strmap: take advantage of FLEXPTR_ALLOC_STR when relevant By default, we do not use a mempool and strdup_strings is true; in this case, we can avoid both an extra allocation and an extra free by just over-allocating for the strmap_entry leaving enough space at the end to copy the key. FLEXPTR_ALLOC_STR exists for exactly this purpose, so make use of it. Also, adjust the case when we are using a memory pool and strdup_strings is true to just do one allocation from the memory pool instead of two so that the strmap_clear() and strmap_remove() code can just avoid freeing the key in all cases. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-11 12:55:27 -08:00
Elijah Newren	a208ec1f0b	strmap: enable allocations to come from a mem_pool For heavy users of strmaps, allowing the keys and entries to be allocated from a memory pool can provide significant overhead savings. Add an option to strmap_init_with_options() to specify a memory pool. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-11 12:55:27 -08:00
Elijah Newren	1201eb628a	strmap: add a strset sub-type Similar to adding strintmap for special-casing a string -> int mapping, add a strset type for cases where we really are only interested in using strmap for storing a set rather than a mapping. In this case, we'll always just store NULL for the value but the different struct type makes it clearer than code comments how a variable is intended to be used. The difference in usage also results in some differences in API: a few things that aren't necessary or meaningful are dropped (namely, the free_values argument to _clear(), and the _get() function), and strset_add() is chosen as the API instead of strset_put(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-06 09:33:35 -08:00
Elijah Newren	6abd22065c	strmap: split create_entry() out of strmap_put() This will facilitate adding entries to a strmap subtype in ways that differ slightly from that of strmap_put(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-06 09:33:35 -08:00
Elijah Newren	4fa1d501f7	strmap: add functions facilitating use as a string->int map Although strmap could be used as a string->int map, one either had to allocate an int for every entry and then deallocate later, or one had to do a bunch of casting between (void) and (intptr_t). Add some special functions that do the casting. Also, rename put->set for such wrapper functions since 'put' implied there may be some deallocation needed if the string was already found in the map, which isn't the case when we're storing an int value directly in the void slot instead of using the void* slot as a pointer to data. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-04 17:09:49 -08:00
Elijah Newren	6ccdfc2a20	strmap: enable faster clearing and reusing of strmaps When strmaps are used heavily, such as is done by my new merge-ort algorithm, and strmaps need to be cleared but then re-used (because of e.g. picking multiple commits to cherry-pick, or due to a recursive merge having several different merges while recursing), free-ing and reallocating map->table repeatedly can add up in time, especially since it will likely be reallocated to a much smaller size but the previous merge provides a good guide to the right size to use for the next merge. Introduce strmap_partial_clear() to take advantage of this type of situation; it will act similar to strmap_clear() except that map->table's entries are zeroed instead of map->table being free'd. Making use of this function reduced the cost of clear_or_reinit_internal_opts() by about 20% in mert-ort, and dropped the overall runtime of my rebase testcase by just under 2%. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-04 17:09:49 -08:00
Elijah Newren	b70c82e6ed	strmap: add more utility functions This adds a number of additional convienence functions I want/need: * strmap_get_size() * strmap_empty() * strmap_remove() * strmap_for_each_entry() * strmap_get_entry() I suspect the first four are self-explanatory. strmap_get_entry() is similar to strmap_get() except that instead of just returning the void* value that the string maps to, it returns the strmap_entry that contains both the string and the void* value (or NULL if the string isn't in the map). This is helpful because it avoids multiple lookups, e.g. in some cases a caller would need to call: * strmap_contains() to check that the map has an entry for the string * strmap_get() to get the void* value * <do some work to update the value> * strmap_put() to update/overwrite the value If the void* pointer returned really is a pointer, then the last step is unnecessary, but if the void* pointer is just cast to an integer then strmap_put() will be needed. In contrast, one can call strmap_get_entry() and then: * check if the string was in the map by whether the pointer is NULL * access the value via entry->value * directly update entry->value meaning that we can replace two or three hash table lookups with one. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-04 17:09:49 -08:00
Elijah Newren	14c4586c2d	merge,rebase,revert: select ort or recursive by config or environment Allow the testsuite to run where it treats requests for "recursive" or the default merge algorithm via consulting the environment variable GIT_TEST_MERGE_ALGORITHM which is expected to either be "recursive" (the old traditional algorithm) or "ort" (the new algorithm). Also, allow folks to pick the new algorithm via config setting. It turns out builtin/merge.c already had a way to allow users to specify a different default merge algorithm: pull.twohead. Rather odd configuration name (especially to be in the 'pull' namespace rather than 'merge') but it's there. Add that same configuration to rebase, cherry-pick, and revert. This required updating the various callsites that called merge_trees() or merge_recursive() to conditionally call the new API, so this serves as another demonstration of what the new API looks and feels like. There are almost certainly some callsites that have not yet been modified to work with the new merge algorithm, but this represents the ones that I have been testing with thus far. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-02 16:35:50 -08:00
Elijah Newren	ae20bf1ad9	strmap: new utility functions Add strmap as a new struct and associated utility functions, specifically for hashmaps that map strings to some value. The API is taken directly from Peff's proposal at https://lore.kernel.org/git/20180906191203.GA26184@sigill.intra.peff.net/ Note that similar string-list, I have a strdup_strings setting. However, unlike string-list, strmap_init() does not take a parameter for this setting and instead automatically sets it to 1; callers who want to control this detail need to instead call strmap_init_with_options(). (Future patches will add additional parameters to strmap_init_with_options()). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-02 12:15:50 -08:00
Elijah Newren	6da1a25814	hashmap: provide deallocation function names hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed for a while, but aren't necessarily the clearest names, especially with hashmap_partial_clear() being added to the mix and lazy-initialization now being supported. Peff suggested we adopt the following names[1]: - hashmap_clear() - remove all entries and de-allocate any hashmap-specific data, but be ready for reuse - hashmap_clear_and_free() - ditto, but free the entries themselves - hashmap_partial_clear() - remove all entries but don't deallocate table - hashmap_partial_clear_and_free() - ditto, but free the entries This patch provides the new names and converts all existing callers over to the new naming scheme. [1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/ Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-02 12:15:50 -08:00
Elijah Newren	33f20d8217	hashmap: introduce a new hashmap_partial_clear() merge-ort is a heavy user of strmaps, which are built on hashmap.[ch]. clear_or_reinit_internal_opts() in merge-ort was taking about 12% of overall runtime in my testcase involving rebasing 35 patches of linux.git across a big rename. clear_or_reinit_internal_opts() was calling hashmap_free() followed by hashmap_init(), meaning that not only was it freeing all the memory associated with each of the strmaps just to immediately allocate a new array again, it was allocating a new array that was likely smaller than needed (thus resulting in later need to rehash things). The ending size of the map table on the previous commit was likely almost perfectly sized for the next commit we wanted to pick, and not dropping and reallocating the table immediately is a win. Add some new API to hashmap to clear a hashmap of entries without freeing map->table (and instead only zeroing it out like alloc_table() would do, along with zeroing the count of items in the table and the shrink_at field). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-02 12:15:50 -08:00
Elijah Newren	b7879b0ba6	hashmap: allow re-use after hashmap_free() Previously, once map->table had been freed, any calls to hashmap_put(), hashmap_get(), or hashmap_remove() would cause a NULL pointer dereference (since hashmap_free_() also zeros the memory; without that zeroing, calling these functions would cause a use-after-free problem). Modify these functions to check for a NULL table and automatically allocate as needed. Also add a HASHMAP_INIT(fn, data) macro for initializing hashmaps on the stack without calling hashmap_init(). Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-02 12:15:50 -08:00
Elijah Newren	97a39a4a93	hashmap: adjust spacing to fix argument alignment No actual code changes; just whitespace adjustments. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-02 12:15:50 -08:00
Elijah Newren	fe1a21d526	fast-rebase: demonstrate merge-ort's API via new test-tool command Add a new test-tool command named 'fast-rebase', which is a super-slimmed down and nowhere near as capable version of 'git rebase'. 'test-tool fast-rebase' is not currently planned for usage in the testsuite, but is here for two purposes: 1) Demonstrate the desired API of merge-ort. In particular, fast-rebase takes advantage of the separation of the merging operation from the updating of the index and working tree, to allow it to pick N commits, but only update the index and working tree once at the end. Look for the calls to merge_incore_nonrecursive() and merge_switch_to_result(). 2) Provide a convenient benchmark that isn't polluted by the heavy disk writing and forking of unnecessary processes that comes from sequencer.c and merge-recursive.c. fast-rebase is not meant to replace sequencer.c, just give ideas on how sequencer.c can be changed. Updating sequencer.c with these goals is probably a large amount of work; writing a simple targeted command with no documentation, less-than-useful help messages, numerous limitations in terms of flags it can accept and situations it can handle, and which is flagged off from users is a much easier interim step. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-29 14:05:48 -07:00
Elijah Newren	47b1e890e3	merge-ort-wrappers: new convience wrappers to mimic the old merge API There are a few differences between the new API in merge-ort and the old API in merge-recursive. While the new API is more flexible, it might feel like more work at times than the old API. merge-ort-wrappers creates two convenience wrappers taking the exact same arguments as the old merge_trees() and merge_recursive() functions and implements them via the new API. This makes converting existing callsites easier, and serves to highlight some of the differences in the API. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-26 22:36:14 -07:00
Elijah Newren	17e5574b04	merge-ort: barebones API of new merge strategy with empty implementation This is the beginning of a new merge strategy. While there are some API differences, and the implementation has some differences in behavior, it is essentially meant as an eventual drop-in replacement for merge-recursive.c. However, it is being built to exist side-by-side with merge-recursive so that we have plenty of time to find out how those differences pan out in the real world while people can still fall back to merge-recursive. (Also, I intend to avoid modifying merge-recursive during this process, to keep it stable.) The primary difference noticable here is that the updating of the working tree and index is not done simultaneously with the merge algorithm, but is a separate post-processing step. The new API is designed so that one can do repeated merges (e.g. during a rebase or cherry-pick) and only update the index and working tree one time at the end instead of updating it with every intermediate result. Also, one can perform a merge between two branches, neither of which match the index or the working tree, without clobbering the index or working tree. The next three commits will demonstrate various uses of this new API. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-26 22:36:10 -07:00
Junio C Hamano	2e673356ae	Git 2.29.1 -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE4fA2sf7nIh/HeOzvsLXohpav5ssFAl+SArMACgkQsLXohpav 5ss3cg//ZT2qC2ypqEaOuVXPVKBlTxIEL43OxQsOQCazpHSHtEnMVLIEDUX467Dd UJyQReuSU/Kis7ucL1MCFXMxdvH8dGl/FdyS6lREND7w1lpudyGjTxO+mTXUrqGH J5zhYC2DzeX7inA6gviN543krVeiT5p0I9Yk5/dWDBOciYKMOsQJGyYkp5Qzw8OO 7hog+B0752iwzpnZMOiIFGY0xiw89X1MkVgNGlzuPebf6Ws3+/LXfPHJbjn5Xd+q V4v/YVBYRPIlTYoMox/2/dE4/FB47MCn+YNpxtELsq0kQCAARcI+ucoqHYegs/2r Q1yQlRr5LAu6oy5hesv7ZvlrHUNIrGzQeTFjdbWSQ83NwsbuUzhmdeiiIGawcjK5 XypTT/FeS9zFLyPQ/PogsnYzFFzG5b3PXx0Eq/K3LeI0qDDpAdnA3HU8pJrLUKGU dZhSMr8G4Xw8fzxQsEcYSkVtFdcwwq2zYB4kFsUh1eoCDJF5ZJTyX8tdXeenlviy VqTlN9uAgTjkVUTAjcODzZwp4MJOl4y8Sql9d+gCzfCsgxNsdBt8debqbF55MuPb /0qZsWveSK0V6pnoWhsqMNtJfwcYjCLw0kaEkn80KYEWj24tAgEi2hDTEav1K21R zigK5ewtCFrbhtXm1B9gxaWYICZUyW4aKf6Mnb2Njkbl2IYZIEY= =y9La -----END PGP SIGNATURE----- Sync with Git 2.29.1	2020-10-22 15:13:15 -07:00
Junio C Hamano	b927c80531	Git 2.29.1 Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-10-22 15:07:25 -07:00
Junio C Hamano	380ba99077	Merge branch 'js/no-builtins-on-disk-option' into maint Brown-paper-bag fix. * js/no-builtins-on-disk-option: SKIP_DASHED_BUILT_INS: do not skip the bin/ programs	2020-10-22 15:01:22 -07:00

1 2 3 4 5 ...

60836 Commits All Branches Search

60836 Commits

All Branches