1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-05 11:46:10 +02:00
Commit Graph

265 Commits

Author SHA1 Message Date
Junio C Hamano 220604042c Merge branch 'jk/unused-anno-more'
More UNUSED annotation to help using -Wunused option with the
compiler.

* jk/unused-anno-more:
  ll-merge: mark unused parameters in callbacks
  diffcore-pickaxe: mark unused parameters in pickaxe functions
  convert: mark unused parameter in null stream filter
  apply: mark unused parameters in noop error/warning routine
  apply: mark unused parameters in handlers
  date: mark unused parameters in handler functions
  string-list: mark unused callback parameters
  object-file: mark unused parameters in hash_unknown functions
  mark unused parameters in trivial compat functions
  update-index: drop unused argc from do_reupdate()
  submodule--helper: drop unused argc from module_list_compute()
  diffstat_consume(): assert non-zero length
2022-10-27 14:51:52 -07:00
Elijah Newren 2b86c10084 merge-ort: fix bug with dir rename vs change dir to symlink
When changing a directory to a symlink on one side of history, and
renaming the parent of that directory to a different directory name
on the other side, e.g. with this kind of setup:

    Base commit: Has a file named dir/subdir/file
    Side1:       Rename dir/ -> renamed-dir/
    Side2:       delete dir/subdir/file, add dir/subdir as symlink

Then merge-ort was running into an assertion failure:

    git: merge-ort.c:2622: apply_directory_rename_modifications: Assertion `ci->dirmask == 0' failed

merge-recursive did not have as obvious an issue handling this case,
likely because we never fixed it to handle the case from commit
902c521a35 ("t6423: more involved directory rename test", 2020-10-15)
where we need to be careful about nested renames when a directory rename
occurs (dir/ -> renamed-dir/ implies dir/subdir/ ->
renamed-dir/subdir/).  However, merge-recursive does have multiple
problems with this testcase:

  * Incorrect stages for the file: merge-recursive omits the stage in
    the index corresponding to the base stage, making `git status`
    report "added by us" for renamed-dir/subdir/file instead of the
    expected "deleted by them".

  * Poor directory/file conflict handling: For the renamed-dir/subdir
    symlink, instead of reporting a file/directory conflict as
    expected, it reports "Error: Refusing to lose untracked file at
    renamed-dir/subdir".  This is a lie because there is no untracked
    file at that location.  It then does the normal suboptimal
    merge-recursive thing of having the symlink be tracked in the index
    at a location where it can't be written due to D/F conflicts
    (namely, renamed-dir/subdir), but writes it to the working tree at
    a different location as a new untracked file (namely,
    renamed-dir/subdir~B^0)

Technically, these problems don't prevent the user from resolving the
merge if they can figure out to ignore the confusion, but because both
pieces of output are quite confusing I don't want to modify the test
to claim the recursive also passes it even if it doesn't have the bug
that ort did.

So, fix the bug in ort by splitting the conflict_info for "dir/subdir"
into two, one for the directory part, one for the file (i.e. symlink)
part, since the symlink is being renamed by directory rename detection.
The directory part is needed for proper nesting, since there are still
conflict_info fields for files underneath it (though those are marked
as is_null, they are still present until the entries are processed,
and the entry processing wants every non-toplevel entry to have a
parent directory).

Reported-by: Stefano Rivera <stefano@rivera.za.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-22 16:10:33 -07:00
Jeff King 1ee3471045 string-list: mark unused callback parameters
String-lists may be used with callbacks for clearing or iteration. These
callbacks need to conform to a particular interface, even though not
every callback needs all of its parameters. Mark the unused ones to make
-Wunused-parameter happy.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-10-17 21:24:04 -07:00
Johannes Schindelin 92481d1b26 merge-ort: return early when failing to write a blob
In the previous commit, we fixed a segmentation fault when a tree object
could not be written.

However, before the tree object is written, `merge-ort` wants to write
out a blob object (except in cases where the merge results in a blob
that already exists in the database). And this can fail, too, but we
ignore that write failure so far.

Let's pay close attention and error out early if the blob could not be
written. This reduces the error output of t4301.25 ("merge-ort fails
gracefully in a read-only repository") from:

	error: insufficient permission for adding an object to repository database ./objects
	error: error: Unable to add numbers to database
	error: insufficient permission for adding an object to repository database ./objects
	error: error: Unable to add greeting to database
	error: insufficient permission for adding an object to repository database ./objects
	fatal: failure to merge

to:

	error: insufficient permission for adding an object to repository database ./objects
	error: error: Unable to add numbers to database
	fatal: failure to merge

This is _not_ just a cosmetic change: Even though one might assume that
the operation would have failed anyway at the point when the new tree
object is written (and the corresponding tree object _will_ be new if it
contains a blob that is new), but that is not so: As pointed out by
Elijah Newren, when Git has previously been allowed to add loose objects
via `sudo` calls, it is very possible that the blob object cannot be
written (because the corresponding `.git/objects/??/` directory may be
owned by `root`) but the tree object can be written (because the
corresponding objects directory is owned by the current user). This
would result in a corrupt repository because it is missing the blob
object, and with this here patch we prevent that.

Note: This patch adjusts two variable declarations from `unsigned` to
`int` because their purpose is to hold the return value of
`handle_content_merge()`, which is of type `int`. The existing users of
those variables are only interested whether that variable is zero or
non-zero, therefore this type change does not affect the existing code.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-28 08:49:35 -07:00
Johannes Schindelin 0b55d930a6 merge-ort: fix segmentation fault in read-only repositories
If the blob/tree objects cannot be written, we really need the merge
operations to fail, and not to continue (and then try to access the tree
object which is however still set to `NULL`).

Let's stop ignoring the return value of `write_object_file()` and
`write_tree()` and set `clean = -1` in the error case.

Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-28 08:49:27 -07:00
Junio C Hamano 56ba6245a4 Merge branch 'en/ort-unused-code-removal'
Code clean-up.

* en/ort-unused-code-removal:
  merge-ort: remove code obsoleted by other changes
2022-08-29 14:55:14 -07:00
Junio C Hamano df3c129e24 Merge branch 'en/submodule-merge-messages-fixes'
Further update the help messages given while merging submodules.

* en/submodule-merge-messages-fixes:
  merge-ort: provide helpful submodule update message when possible
  merge-ort: avoid surprise with new sub_flag variable
  merge-ort: remove translator lego in new "submodule conflict suggestion"
  submodule merge: update conflict error message
2022-08-25 14:42:29 -07:00
Elijah Newren ff033db7a8 merge-ort: remove code obsoleted by other changes
Commit 66b209b86a ("merge-ort: implement CE_SKIP_WORKTREE handling with
conflicted entries", 2021-03-20) added some code for merge-ort to handle
conflicted and skip_worktree entries in general.  Included in this was
an ugly hack for dealing with present-despite-skipped entries and a
testcase (t6428.2) specific to that hack, since at that time users could
accidentally get files into that state when using a sparse checkout.

However, with the merging of 82386b4496 ("Merge branch
'en/present-despite-skipped'", 2022-03-09), that class of problems was
addressed globally and in a much cleaner way.  As such, the
present-despite-skipped hack in merge-ort is no longer needed and can
simply be removed.

No additional testcase is needed here; t6428.2 was written to test the
necessary functionality and is being kept.  The fact that this test
continues to pass despite the code being removed shows that the extra
code is no longer necessary.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-19 10:43:41 -07:00
Elijah Newren 565577ed88 merge-ort: provide helpful submodule update message when possible
In commit 4057523a40 ("submodule merge: update conflict error message",
2022-08-04), a more detailed message was provided when submodules
conflict, in order to help users know how to resolve those conflicts.
There were a couple situations for which a different message would be
more appropriate, but that commit left handling those for future work.
Unfortunately, that commit would check if any submodules were of the
type that it didn't know how to explain, and, if so, would avoid
providing the more detailed explanation even for the submodules it did
know how to explain.

Change this to have the code print the helpful messages for the subset
of submodules it knows how to explain.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-18 09:49:30 -07:00
Elijah Newren 34ce504a33 merge-ort: avoid surprise with new sub_flag variable
Commit 4057523a40 ("submodule merge: update conflict error message",
2022-08-04) added a sub_flag variable that is used to store a value from
enum conflict_and_info_types, but initializes it with a value of -1 that
does not correspond to any of the conflict_and_info_types.  The code may
never set it to a valid value and yet still use it, which can be
surprising when reading over the code at first.  Initialize it instead
to the generic CONFLICT_SUBMODULE_FAILED_TO_MERGE value, which is still
distinct from the two values we need to special case.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-18 09:49:30 -07:00
Elijah Newren a5834b775b merge-ort: remove translator lego in new "submodule conflict suggestion"
In commit 4057523a40 ("submodule merge: update conflict error message",
2022-08-04), the new "submodule conflict suggestion" code was
translating 6 different pieces of the new message and then used
carefully crafted logic to allow stitching it back together with special
formatting.  Keep the components of the message together as much as
possible, so that:
  * we reduce the number of things translators have to translate
  * translators have more control over the format of the output
  * the code is much easier for developers to understand too

Also, reformat some comments running beyond the 80th column while at it.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-18 09:49:30 -07:00
Junio C Hamano bac92b1f39 Merge branch 'js/ort-clean-up-after-failed-merge'
Plug memory leaks in the failure code path in the "merge-ort" merge
strategy backend.

* js/ort-clean-up-after-failed-merge:
  merge-ort: do leave trace2 region even if checkout fails
  merge-ort: clean up after failed merge
2022-08-08 13:13:14 -07:00
Calvin Wan 4057523a40 submodule merge: update conflict error message
When attempting to merge in a superproject with conflicting submodule
pointers that cannot be fast-forwarded or trivially resolved, the merge
fails and Git prints an error message that accurately describes the
failure, but does not provide steps for the user to resolve the error.

Git is left in a conflicted state, which requires the user to:
 1. merge submodules or update submodules to an already existing
	commit that reflects the merge
 2. add submodules changes to the superproject
 3. finish merging superproject
These steps are non-obvious for newer submodule users to figure out
based on the error message and neither `git submodule status` nor `git
status` provide any useful pointers.

Update error message to provide steps to resolve submodule merge
conflict. Future work could involve adding an advice flag to the
message. Although the message is long, it also has the id of the
submodule commit that needs to be merged, which could be useful
information for the user.

Additionally, 5 merge failures that resulted in an early return have
been updated to reflect the status of the merge.
1. Null merge base (null o): CONFLICT_SUBMODULE_NULL_MERGE_BASE added
   as a new conflict type and will print updated error message.
2. Null merge side a (null a): BUG(). See [1] for discussion
3. Null merge side b (null b): BUG(). See [1] for discussion
4. Submodule not checked out: added NEEDSWORK bit
5. Submodule commits not present: added NEEDSWORK bit
The errors with a NEEDSWORK bit deserve a more detailed explanation of
how to resolve them. See [2] for more context.

[1] https://lore.kernel.org/git/CABPp-BE0qGwUy80dmVszkJQ+tcpfLRW0OZyErymzhZ9+HWY1mw@mail.gmail.com/
[2] https://lore.kernel.org/git/xmqqpmhjjwo9.fsf@gitster.g/

Signed-off-by: Calvin Wan <calvinwan@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-08-04 13:43:07 -07:00
Johannes Schindelin 1250dff32b merge-ort: do leave trace2 region even if checkout fails
In 557ac0350d (merge-ort: begin performance work; instrument with
trace2_region_* calls, 2021-01-23), we added Trace2 instrumentation, but
in the error path that returns early, we forgot to tell Trace2 that
we're leaving the region. Let's fix that.

Pointed-out-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-31 19:24:27 -07:00
Johannes Schindelin fef2b6dace merge-ort: clean up after failed merge
In 9fefce68dc (merge-ort: basic outline for merge_switch_to_result(),
2020-12-13), we added functionality to lay down the result of a merge on
disk. But we forgot to release the data structures in case
`unpack_trees()` failed to run properly.

This was pointed out by the `linux-leaks` job in our CI runs.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-31 19:24:13 -07:00
Junio C Hamano e3349f2888 Merge branch 'en/merge-dual-dir-renames-fix'
Fixes a long-standing corner case bug around directory renames in
the merge-ort strategy.

* en/merge-dual-dir-renames-fix:
  merge-ort: fix issue with dual rename and add/add conflict
  merge-ort: shuffle the computation and cleanup of potential collisions
  merge-ort: make a separate function for freeing struct collisions
  merge-ort: small cleanups of check_for_directory_rename
  t6423: add tests of dual directory rename plus add/add conflict
2022-07-18 13:31:56 -07:00
Elijah Newren 751e165424 merge-ort: fix issue with dual rename and add/add conflict
There is code in both merge-recursive and merge-ort for avoiding doubly
transitive renames (i.e. one side renames directory A/ -> B/, and the
other side renames directory B/ -> C/), because this combination would
otherwise make a mess for new files added to A/ on the first side and
wondering which directory they end up in -- especially if there were
even more renames such as the first side renaming C/ -> D/.  In such
cases, it just turns "off" directory rename detection for the higher
order transitive cases.

The testcases added in t6423 a couple commits ago are slightly different
but similar in principle.  They involve a similar case of paired
renaming but instead of A/ -> B/ and B/ -> C/, the second side renames
a leading directory of B/ to C/.  And both sides add a new file
somewhere under the directory that the other side will rename.  While
the new files added start within different directories and thus could
logically end up within different directories, it is weird for a file
on one side to end up where the other one started and not move along
with it.  So, let's just turn off directory rename detection in this
case as well.

Another way to look at this is that if the source name involved in a
directory rename on one side is the target name of a directory rename
operation for a file from the other side, then we avoid the doubly
transitive rename.  (More concretely, if a directory rename on side D
wants to rename a file on side E from OLD_NAME -> NEW_NAME, and side D
already had a file named NEW_NAME, and a directory rename on side E
wants to rename side D's NEW_NAME -> NEWER_NAME, then we turn off the
directory rename detection for NEW_NAME to prevent the
NEW_NAME -> NEWER_NAME rename, and instead end up with an add/add
conflict on NEW_NAME.)

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06 09:39:46 -07:00
Elijah Newren 3ffbe5a223 merge-ort: shuffle the computation and cleanup of potential collisions
Run compute_collisions() for renames on both sides of history before
any calls to collect_renames(), and do not free the computed collisions
until after both calls to collect_renames().  This is just a code
reorganization at this point that doesn't make sense on its own, but
will permit us to use the computed collision info from both sides
within each call to collect_renames() in a subsequent commit.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06 09:39:46 -07:00
Elijah Newren 6dd1f0e9d4 merge-ort: make a separate function for freeing struct collisions
This commit makes no functional changes, it's just some code movement in
preparation for later changes.

Signed-off-by: Elijah Newren <newren@palantir.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06 09:39:46 -07:00
Elijah Newren 51e41e4eaf merge-ort: small cleanups of check_for_directory_rename
No functional changes, just some preparatory cleanups.

Suggested-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@palantir.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-07-06 09:39:46 -07:00
Elijah Newren de90581141 merge-ort: optionally produce machine-readable output
With the new `detailed` parameter, a new mode can be triggered when
displaying the merge messages: The `detailed` mode prints NUL-delimited
fields of the following form:

	<path-count> NUL <path>... NUL <conflict-type> NUL <message>

The `<path-count>` field determines how many `<path>` fields there are.

The intention of this mode is to support server-side operations, where
worktree-less merges can lead to conflicts and depending on the type
and/or path count, the caller might know how to handle said conflict.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:06 -07:00
Elijah Newren cb2607759e merge-ort: store more specific conflict information
It is all fine and dandy for a regular Git command that is intended to
be run interactively to produce a bunch of messages upon an error.

However, in `merge-ort`'s case, we want to call the command e.g. in
server-side software, where the actual error messages are not quite as
interesting as machine-readable, immutable terms that describe the exact
nature of any given conflict.

With this patch, the `merge-ort` machinery records the exact type (as
specified via an `enum` value) as well as the involved path(s) together
with the conflict's message.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:06 -07:00
Johannes Schindelin 2715e8a931 merge-ort: make `path_messages` a strmap to a string_list
This allows us once again to get away with less data copying.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:06 -07:00
Johannes Schindelin 6debb7527b merge-ort: store messages in a list, not in a single strbuf
To prepare for using the `merge-ort` machinery in server operations, we
cannot simply produce a free-form string that combines a variable-length
list of messages.

Instead, we need to list them one by one. The natural fit for this is a
`string_list`.

We will subsequently add even more information in the `util` attribute
of the string list items.

Based-on-a-patch-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:06 -07:00
Elijah Newren a4040cfa35 merge-ort: remove command-line-centric submodule message from merge-ort
There was one case in merge-ort that would call path_msg() multiple
times for the same logical conflict, and it was in order to give advice
about how to resolve a conflict.  This advice does not make as much
sense with remerge-diff, or with merge-tree being invoked by a GitHub
GUI for resolution of messages, and is making it hard to provide
which-logical-conflict-affects-which-paths information in a machine
parseable way to a higher level caller of merge-tree.  Let's simply
remove this informational message.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:06 -07:00
Elijah Newren fae26ce79c merge-ort: provide a merge_get_conflicted_files() helper function
After a merge, this function allows the user to extract the same
information that would be printed by `ls-files -u`, which means
files with their mode, oid, and stage.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:06 -07:00
Elijah Newren a34edae68a merge-ort: split out a separate display_update_messages() function
This patch includes no new code; it simply moves a bunch of lines into a
new function.  As such, there are no functional changes.  This is just a
preparatory step to allow the printed messages to be handled differently
by other callers, such as in `git merge-tree --write-tree`.

(Patch best viewed with
     --color-moved --color-moved-ws=allow-indentation-change
 to see that it is a simple code movement.)

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-22 16:10:06 -07:00
Junio C Hamano 2da81d1efb Merge branch 'ab/plug-leak-in-revisions'
Plug the memory leaks from the trickiest API of all, the revision
walker.

* ab/plug-leak-in-revisions: (27 commits)
  revisions API: add a TODO for diff_free(&revs->diffopt)
  revisions API: have release_revisions() release "topo_walk_info"
  revisions API: have release_revisions() release "date_mode"
  revisions API: call diff_free(&revs->pruning) in revisions_release()
  revisions API: release "reflog_info" in release revisions()
  revisions API: clear "boundary_commits" in release_revisions()
  revisions API: have release_revisions() release "prune_data"
  revisions API: have release_revisions() release "grep_filter"
  revisions API: have release_revisions() release "filter"
  revisions API: have release_revisions() release "cmdline"
  revisions API: have release_revisions() release "mailmap"
  revisions API: have release_revisions() release "commits"
  revisions API users: use release_revisions() for "prune_data" users
  revisions API users: use release_revisions() with UNLEAK()
  revisions API users: use release_revisions() in builtin/log.c
  revisions API users: use release_revisions() in http-push.c
  revisions API users: add "goto cleanup" for release_revisions()
  stash: always have the owner of "stash_info" free it
  revisions API users: use release_revisions() needing REV_INFO_INIT
  revision.[ch]: document and move code declared around "init"
  ...
2022-06-07 14:10:56 -07:00
Junio C Hamano 538dc459a0 Merge branch 'ep/maint-equals-null-cocci'
Introduce and apply coccinelle rule to discourage an explicit
comparison between a pointer and NULL, and applies the clean-up to
the maintenance track.

* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-20 15:26:59 -07:00
Junio C Hamano 2b0a58d164 Merge branch 'ep/maint-equals-null-cocci' for maint-2.35
* ep/maint-equals-null-cocci:
  tree-wide: apply equals-null.cocci
  contrib/coccinnelle: add equals-null.cocci
2022-05-02 10:06:04 -07:00
Junio C Hamano afe8a9070b tree-wide: apply equals-null.cocci
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-02 09:50:37 -07:00
Ævar Arnfjörð Bjarmason 2108fe4a19 revisions API users: add straightforward release_revisions()
Add a release_revisions() to various users of "struct rev_list" in
those straightforward cases where we only need to add the
release_revisions() call to the end of a block, and don't need to
e.g. refactor anything to use a "goto cleanup" pattern.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-04-13 23:56:08 -07:00
Junio C Hamano 38bbb9e990 Merge branch 'ab/string-list-count-in-size-t'
Count string_list items in size_t, not "unsigned int".

* ab/string-list-count-in-size-t:
  string-list API: change "nr" and "alloc" to "size_t"
  gettext API users: don't explicitly cast ngettext()'s "n"
2022-03-16 17:53:09 -07:00
Junio C Hamano 430883a70c Merge branch 'ab/object-file-api-updates'
Object-file API shuffling.

* ab/object-file-api-updates:
  object-file API: pass an enum to read_object_with_reference()
  object-file.c: add a literal version of write_object_file_prepare()
  object-file API: have hash_object_file() take "enum object_type"
  object API: rename hash_object_file_literally() to write_*()
  object-file API: split up and simplify check_object_signature()
  object API users + docs: check <0, not !0 with check_object_signature()
  object API docs: move check_object_signature() docs to cache.h
  object API: correct "buf" v.s. "map" mismatch in *.c and *.h
  object-file API: have write_object_file() take "enum object_type"
  object-file API: add a format_object_header() function
  object-file API: return "void", not "int" from hash_object_file()
  object-file.c: split up declaration of unrelated variables
2022-03-16 17:53:08 -07:00
Junio C Hamano 8b44e05abf Merge branch 'en/merge-ort-align-verbosity-with-recursive'
Align the level of verbose output from the ort backend during inner
merge to that of the recursive backend.

* en/merge-ort-align-verbosity-with-recursive:
  merge-ort: exclude messages from inner merges by default
2022-03-13 22:56:17 +00:00
Ævar Arnfjörð Bjarmason 99d60545f8 string-list API: change "nr" and "alloc" to "size_t"
Change the "nr" and "alloc" members of "struct string_list" to use
"size_t" instead of "nr". On some platforms the size of an "unsigned
int" will be smaller than a "size_t", e.g. a 32 bit unsigned v.s. 64
bit unsigned. As "struct string_list" is a generic API we use in a lot
of places this might cause overflows.

As one example: code in "refs.c" keeps track of the number of refs
with a "size_t", and auxiliary code in builtin/remote.c in
get_ref_states() appends those to a "struct string_list".

While we're at it split the "nr" and "alloc" in string-list.h across
two lines, which is the case for most such struct member
declarations (e.g. in "strbuf.h" and "strvec.h").

Changing e.g. "int i" to "size_t i" in run_and_feed_hook() isn't
strictly necessary, and there are a lot more cases where we'll use a
local "int", "unsigned int" etc. variable derived from the "nr" in the
"struct string_list". But in that case as well as
add_wrapped_shortlog_msg() in builtin/shortlog.c we need to adjust the
printf format referring to "nr" anyway, so let's also change the other
variables referring to it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-07 12:02:04 -08:00
Junio C Hamano ae59346f09 Merge branch 'en/merge-ort-plug-leaks'
Leakfix.

* en/merge-ort-plug-leaks:
  merge-ort: fix small memory leak in unique_path()
  merge-ort: fix small memory leak in detect_and_process_renames()
2022-03-06 21:25:31 -08:00
Elijah Newren 624a93507e merge-ort: exclude messages from inner merges by default
merge-recursive would only report messages from inner merges when the
GIT_MERGE_VERBOSITY was set to 5.  Do the same for merge-ort.

Note that somewhat reverts 0d83d8240d ("merge-ort: mark conflict/warning
messages from inner merges as omittable", 2022-02-02) based on two
facts:

  * This commit basically removes the showing of messages from inner
    merges as well, at least by default.  The only difference is that
    users can request to get them back by turning up the verbosity.
  * Messages from inner merges are specially annotated since 4a3d86e1bb
    ("merge-ort: make informational messages from recursive merges
    clearer", 2022-02-17).  The ability to distinguish them from outer
    merge comments make them less problematic to include, and easier
    for humans to parse.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-03-01 23:31:56 -08:00
Ævar Arnfjörð Bjarmason c80d226a04 object-file API: have write_object_file() take "enum object_type"
Change the write_object_file() function to take an "enum object_type"
instead of a "const char *type". Its callers either passed
{commit,tree,blob,tag}_type and can pass the corresponding OBJ_* type
instead, or were hardcoding strings like "blob".

This avoids the back & forth fragility where the callers of
write_object_file() would have the enum type, and convert it
themselves via type_name(). We do have to now do that conversion
ourselves before calling write_object_file_prepare(), but those
codepaths will be similarly adjusted in subsequent commits.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-25 17:16:31 -08:00
Elijah Newren 81afc79412 merge-ort: fix small memory leak in unique_path()
The struct strmap paths member of merge_options_internal is perhaps the
most central data structure to all of merge-ort.  Because all the paths
involved in the merge need to be kept until the merge is complete, this
"paths" data structure traditionally took responsibility for owning all
the allocated paths.  When the merge is over, those paths were free()d
as part of free()ing this strmap.

In commit 6697ee01b5 (merge-ort: switch our strmaps over to using
memory pools, 2021-07-30), we changed the allocations for pathnames to
come from a memory pool.  That meant the ownership changed slightly;
there were no individual free() calls to make, instead the memory pool
owned all those paths and they were free()d all at once.

Unfortunately unique_path() was written presuming the pre-memory-pool
model, and allocated a path on the heap and left it in the strmap for
later free()ing.  Modify it to return a path allocated from the memory
pool instead.

Note that there's one instance -- in record_conflicted_index_entries()
-- where the returned string from unique_path() was only used very
temporarily and thus had been immediately free()'d.  This codepath was
associated with an ugly skip-worktree workaround that has since been
better fixed by the in-flight en/present-despite-skipped topic.  This
workaround probably makes sense to excise once that topic merges down,
but for now, just remove the immediate free() and allow the returned
string to be free()d when the memory pool is released.

This fixes the following memory leak as reported by valgrind:

==PID== 65 bytes in 1 blocks are definitely lost in loss record 79 of 134
==PID==    at 0xADDRESS: malloc
==PID==    by 0xADDRESS: realloc
==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
==PID==    by 0xADDRESS: strbuf_grow (strbuf.c:98)
==PID==    by 0xADDRESS: strbuf_vaddf (strbuf.c:394)
==PID==    by 0xADDRESS: strbuf_addf (strbuf.c:335)
==PID==    by 0xADDRESS: unique_path (merge-ort.c:733)
==PID==    by 0xADDRESS: process_entry (merge-ort.c:3678)
==PID==    by 0xADDRESS: process_entries (merge-ort.c:4037)
==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4621)
==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-20 00:03:30 -08:00
Elijah Newren 8d60e9d201 merge-ort: fix small memory leak in detect_and_process_renames()
detect_and_process_renames() detects renames on both sides of history
and then combines these into a single diff_queue_struct.  The combined
diff_queue_struct needs to be able to hold the renames found on either
side, and since it knows the (maximum) size it needs, it pre-emptively
grows the array to the appropriate size:

	ALLOC_GROW(combined.queue,
		   renames->pairs[1].nr + renames->pairs[2].nr,
		   combined.alloc);

It then collects the items from each side:

	collect_renames(opt, &combined, MERGE_SIDE1, ...)
	collect_renames(opt, &combined, MERGE_SIDE2, ...)

Note, though, that collect_renames() sometimes determines that some
pairs are unnecessary and does not include them in the combined array.
When it is done, detect_and_process_renames() frees this memory:

	if (combined.nr) {
                ...
		free(combined.queue);
        }

The problem is that sometimes even when there are pairs, none of them
are necessary.  Instead of checking combined.nr, just remove the
if-check; free() knows to skip NULL pointers.  This change fixes the
following memory leak, as reported by valgrind:

==PID== 192 bytes in 1 blocks are definitely lost in loss record 107 of 134
==PID==    at 0xADDRESS: malloc
==PID==    by 0xADDRESS: realloc
==PID==    by 0xADDRESS: xrealloc (wrapper.c:126)
==PID==    by 0xADDRESS: detect_and_process_renames (merge-ort.c:3134)
==PID==    by 0xADDRESS: merge_ort_nonrecursive_internal (merge-ort.c:4610)
==PID==    by 0xADDRESS: merge_ort_internal (merge-ort.c:4709)
==PID==    by 0xADDRESS: merge_incore_recursive (merge-ort.c:4760)
==PID==    by 0xADDRESS: merge_ort_recursive (merge-ort-wrappers.c:57)
==PID==    by 0xADDRESS: try_merge_strategy (merge.c:753)
==PID==    by 0xADDRESS: cmd_merge (merge.c:1676)
==PID==    by 0xADDRESS: run_builtin (git.c:461)
==PID==    by 0xADDRESS: handle_builtin (git.c:713)
==PID==    by 0xADDRESS: run_argv (git.c:780)
==PID==    by 0xADDRESS: cmd_main (git.c:911)
==PID==    by 0xADDRESS: main (common-main.c:52)

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-20 00:03:29 -08:00
Elijah Newren 4a3d86e1bb merge-ort: make informational messages from recursive merges clearer
This is another simple change with a long explanation...

merge-recursive and merge-ort are both based on the same recursive idea:
if there is more than one merge base, merge the merge bases (which may
require first merging the merge bases of the merges bases, etc.).  The
depth of the inner merge is recorded via a variable called "call_depth",
which we'll bring up again later.  Naturally, the inner merges
themselves can have conflicts and various messages generated about those
files.

merge-recursive immediately prints to stdout as it goes, at the risk of
printing multiple conflict notices for the same path separated far apart
from each other with many intervenining conflict notices for other paths
between them.  And this is true even if there are no inner merges
involved.  An example of this was given in [1] and apparently caused
some confusion:

    CONFLICT (rename/add): Rename A->B in HEAD. B added in otherbranch
    ...dozens of conflicts for OTHER paths...
    CONFLICT (content): Merge conflicts in B

In contrast, merge-ort collects messages and stores them by path so that
it can print them grouped by path.  Thus, the same case handled by
merge-ort would have output of the form:

    CONFLICT (rename/add): Rename A->B in HEAD. B added in otherbranch
    CONFLICT (content): Merge conflicts in B
    ...dozens of conflicts for OTHER paths...

This is generally helpful, but does make a separate bug more
problematic.  In particular, while merge-recursive might report the
following for a recursive merge:

      Auto-merging dir.c
      Auto-merging midx.c
      CONFLICT (content): Merge conflict in midx.c
    Auto-merging diff.c
    Auto-merging dir.c
    CONFLICT (content): Merge conflict in dir.c

merge-ort would instead report:

    Auto-merging diff.c
    Auto-merging dir.c
    Auto-merging dir.c
    CONFLICT (content): Merge conflict in dir.c
    Auto-merging midx.c
    CONFLICT (content): Merge conflict in midx.c

The fact that messages for the same file are together is probably
helpful in general, but with the indentation missing for the inner
merge it unfortunately serves to confuse.  This probably would lead
users to wonder:

  * Why is Git reporting that "dir.c" is being merged twice?
  * If midx.c has conflicts, why do I not see any when I open up the
    file and why are no conflicts shown in the index?

Fix this output confusion by changing the output to clearly
differentiate the messages for outer merges from the ones for inner
merges, changing the above output from merge-ort to:

    Auto-merging diff.c
      From inner merge:  Auto-merging dir.c
    Auto-merging dir.c
    CONFLICT (content): Merge conflict in dir.c
      From inner merge:  Auto-merging midx.c
      From inner merge:  CONFLICT (content): Merge conflict in midx.c

(Note: the number of spaces after the 'From inner merge:' is
2*call_depth).

One other thing to note here, that I didn't notice until typing up this
commit message, is that merge-recursive does not print any messages from
the inner merges by default; the extra verbosity has to be requested.
merge-ort currently has no verbosity controls and always prints these.
We may also want to change that, but for now, just make the output
clearer with these extra markings and indentation.

[1] https://lore.kernel.org/git/CAGyf7-He4in8JWUh9dpAwvoPkQz9hr8nCBpxOxhZEd8+jtqTpg@mail.gmail.com/

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-17 15:11:00 -08:00
Junio C Hamano 90b7153806 Merge branch 'en/remerge-diff'
"git log --remerge-diff" shows the difference from mechanical merge
result and the result that is actually recorded in a merge commit.

* en/remerge-diff:
  diff-merges: avoid history simplifications when diffing merges
  merge-ort: mark conflict/warning messages from inner merges as omittable
  show, log: include conflict/warning messages in --remerge-diff headers
  diff: add ability to insert additional headers for paths
  merge-ort: format messages slightly different for use in headers
  merge-ort: mark a few more conflict messages as omittable
  merge-ort: capture and print ll-merge warnings in our preferred fashion
  ll-merge: make callers responsible for showing warnings
  log: clean unneeded objects during `log --remerge-diff`
  show, log: provide a --remerge-diff capability
2022-02-16 15:14:29 -08:00
Junio C Hamano c70b5e7187 Merge branch 'en/plug-leaks-in-merge'
Leakfix.

* en/plug-leaks-in-merge:
  merge: fix memory leaks in cmd_merge()
  merge-ort: fix memory leak in merge_ort_internal()
2022-02-09 14:21:00 -08:00
Junio C Hamano 87bfbd52e2 Merge branch 'en/merge-ort-restart-optim-fix'
The merge-ort misbehaved when merge.renameLimit configuration is
set too low and failed to find all renames.

* en/merge-ort-restart-optim-fix:
  merge-ort: avoid assuming all renames detected
2022-02-09 14:20:58 -08:00
Elijah Newren 0d83d8240d merge-ort: mark conflict/warning messages from inner merges as omittable
A recursive merge involves merging the merge bases of the two branches
being merged.  Such an inner merge can itself generate conflict notices.
While such notices may be useful when initially trying to create a
merge, they seem to just be noise when investigating merges later with
--remerge-diff.  (Especially when both sides of the outer merge resolved
the conflict the same way leading to no overall conflict.)  Remove them.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 10:02:28 -08:00
Elijah Newren 20323d104e show, log: include conflict/warning messages in --remerge-diff headers
Conflicts such as modify/delete, rename/rename, or file/directory are
not representable via content conflict markers, and the normal output
messages notifying users about these were dropped with --remerge-diff.
While we don't want these messages randomly shown before the commit
and diff headers, we do want them to still be shown; include them as
part of the diff headers instead.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 10:02:28 -08:00
Elijah Newren 6054d1aac3 merge-ort: format messages slightly different for use in headers
When users run
    git show --remerge-diff $MERGE_COMMIT
or
    git log -p --remerge-diff ...
stdout is not an appropriate location to dump conflict messages, but we
do want to provide them to users.  We will include them in the diff
headers instead...but for that to work, we need for any multiline
messages to replace newlines with both a newline and a space.  Add a new
flag to signal when we want these messages modified in such a fashion,
and use it in path_msg() to modify these messages this way.  Also, allow
a special prefix to be specified for these headers.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 10:02:27 -08:00
Elijah Newren a28d094ac2 merge-ort: mark a few more conflict messages as omittable
path_msg() has the ability to mark messages as omittable, designed for
remerge-diff where we'll instead be showing conflict messages as diff
headers for a subsequent diff.  While all these messages are very useful
when trying to create a merge initially, early use with the
--remerge-diff feature (the only user of this omittable conflict message
capability), suggests that the particular messages marked in this commit
are just noise when trying to see what changes users made to create a
merge commit.  Mark them as omittable.

Note that there were already a few messages marked as omittable in
merge-ort when doing a remerge-diff, because the development of
--remerge-diff preceded the upstreaming of merge-ort and I was trying to
ensure merge-ort could handle all the necessary requirements.  See
commit c5a6f65527 ("merge-ort: add modify/delete handling and delayed
output processing", 2020-12-03) for the initial details.  For some
examples of already-marked-as-omittable messages, see either
"Auto-merging <path>" or some of the submodule update hints.  This
commit just adds two more messages that should also be omittable.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 10:02:27 -08:00
Elijah Newren 24dbdab50d merge-ort: capture and print ll-merge warnings in our preferred fashion
Instead of immediately printing ll-merge warnings to stderr, we save
them in our output strbuf.  Besides allowing us to move these warnings
to a special file for --remerge-diff, this has two other benefits for
regular merges done by merge-ort:

  * The deferral of messages ensures we can print all messages about
    any given path together (merge-recursive was known to sometimes
    intersperse messages about other paths, particularly when renames
    were involved).

  * The deferral of messages means we can avoid printing spurious
    conflict messages when we just end up aborting due to local user
    modifications in the way.  (In contrast to merge-recursive.c which
    prematurely checks for local modifications in the way via
    unpack_trees() and gets the check wrong both in terms of false
    positives and false negatives relative to renames, merge-ort does
    not perform the local modifications in the way check until the
    checkout() step after the full merge has been computed.)

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 10:02:27 -08:00
Elijah Newren 35f6967161 ll-merge: make callers responsible for showing warnings
Since some callers may want to send warning messages to somewhere other
than stdout/stderr, stop printing "warning: Cannot merge binary files"
from ll-merge and instead modify the return status of ll_merge() to
indicate when a merge of binary files has occurred.  Message printing
probably does not belong in a "low-level merge" anyway.

This commit continues printing the message as-is, just from the callers
instead of within ll_merge().  Future changes will start handling the
message differently in the merge-ort codepath.

There was one special case here: the callers in rerere.c do NOT check
for and print such a message; since those code paths explicitly skip
over binary files, there is no reason to check for a return status of
LL_MERGE_BINARY_CONFLICT or print the related message.

Note that my methodology included first modifying ll_merge() to return
a struct, so that the compiler would catch all the callers for me and
ensure I had modified all of them.  After modifying all of them, I then
changed the struct to an enum.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-02-02 10:02:27 -08:00
Elijah Newren a59b8dd94f merge-ort: fix memory leak in merge_ort_internal()
The documentation for merge_incore_recursive(), modelled after
merge_recursive(), notes that

   merge_bases will be consumed (emptied) so make a copy if you need it

However, in merge_ort_internal() (which merge_incore_recursive() calls),
it runs

   merged_merge_bases = pop_commit(&merge_bases);
   ...
   for (iter = merge_bases; iter; iter = iter->next) {
      ...
   }

In other words, it only consumes the *first* entry of merge_bases, and
the rest it iterates through.  If it iterated through all of them, the
caller could be responsible for free'ing the memory.  If it consumed all
of them, the current documentation would be correct and the callers
would need to do nothing.  The current middle ground makes it impossible
for callers to avoid memory leaks, since any attempt to use the
merge_bases it passes in would result in a use-after-free.

It turns out this part of the code was copied from merge-recursive.c,
which has had the same bug for 15.5 years.  However, since we are trying
to keep merge-recursive.c stable as we sunset it, let's just fix the
leak in in merge_ort_internal() by having it actually consume all the
elements of the merge_bases commit_list.

Testing this commit against t6404 (the first testcase specifically
about recursive merges) under valgrind shows that this patch fixes
the following leak:

    32 (16 direct, 16 indirect) bytes in 1 blocks are definitely lost \
    in loss record 49 of 126
       at 0x484086F: malloc (vg_replace_malloc.c:380)
       by 0x69FFEB: do_xmalloc (wrapper.c:41)
       by 0x6A0073: xmalloc (wrapper.c:62)
       by 0x52A72D: commit_list_insert (commit.c:556)
       by 0x47EC86: try_merge_strategy (merge.c:751)
       by 0x48143B: cmd_merge (merge.c:1679)
       by 0x40686E: run_builtin (git.c:464)
       by 0x406C51: handle_builtin (git.c:716)
       by 0x406E96: run_argv (git.c:783)
       by 0x40730A: cmd_main (git.c:914)
       by 0x4E7DFA: main (common-main.c:56)

Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-21 15:48:15 -08:00
Elijah Newren 9ae39fef7f merge-ort: avoid assuming all renames detected
In commit 8b09a900a1 ("merge-ort: restart merge with cached renames to
reduce process entry cost", 2021-07-16), we noted that in the merge-ort
steps of
    collect_merge_info()
    detect_and_process_renames()
    process_entries()
that process_entries() was expensive, and we could often make it cheaper
by changing this to
    collect_merge_info()
    detect_and_process_renames()
    <cache all the renames, and restart>
    collect_merge_info()
    detect_and_process_renames()
    process_entries()
because the second collect_merge_info() would be cheaper (we could avoid
traversing into some directories), the second
detect_and_process_renames() would be free since we had already detected
all renames, and then process_entries() has far fewer entries to handle.

However, this was built on the assumption that the first
detect_and_process_renames() actually detected all potential renames.
If someone has merge.renameLimit set to some small value, that
assumption is violated which manifests later with the following message:

    $ git -c merge.renameLimit=1 rebase upstream
    ...
    git: merge-ort.c:546: clear_or_reinit_internal_opts: Assertion
    `renames->cached_pairs_valid_side == 0' failed.

Turn off this cache-renames-and-restart whenever we cannot detect all
renames, and add a testcase that would have caught this problem.

Reported-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Tested-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-01-17 14:24:22 -08:00
Elijah Newren d30126c20d merge-ort: fix bug with renormalization and rename/delete conflicts
Ever since commit a492d5331c ("merge-ort: ensure we consult df_conflict
and path_conflicts", 2021-06-30), when renormalization is active AND a
file is involved in a rename/delete conflict BUT the file is unmodified
(either before or after renormalization), merge-ort was running into an
assertion failure.  Prior to that commit (or if assertions were compiled
out), merge-ort would mis-merge instead, ignoring the rename/delete
conflict and just deleting the file.

Remove the assertions, fix the code appropriately, leave some good
comments in the code, and add a testcase for this situation.

Reported-by: Ralf Thielow <ralf.thielow@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-12-30 10:40:26 -08:00
Junio C Hamano 162a13b855 Merge branch 'jt/no-abuse-alternate-odb-for-submodules'
Follow through the work to use the repo interface to access
submodule objects in-process, instead of abusing the alternate
object database interface.

* jt/no-abuse-alternate-odb-for-submodules:
  submodule: trace adding submodule ODB as alternate
  submodule: pass repo to check_has_commit()
  object-file: only register submodule ODB if needed
  merge-{ort,recursive}: remove add_submodule_odb()
  refs: peeling non-the_repository iterators is BUG
  refs: teach arbitrary repo support to iterators
  refs: plumb repo into ref stores
2021-10-25 16:06:56 -07:00
Junio C Hamano a7c2daa06d Merge branch 'en/removing-untracked-fixes'
Various fixes in code paths that move untracked files away to make room.

* en/removing-untracked-fixes:
  Documentation: call out commands that nuke untracked files/directories
  Comment important codepaths regarding nuking untracked files/dirs
  unpack-trees: avoid nuking untracked dir in way of locally deleted file
  unpack-trees: avoid nuking untracked dir in way of unmerged file
  Change unpack_trees' 'reset' flag into an enum
  Remove ignored files by default when they are in the way
  unpack-trees: make dir an internal-only struct
  unpack-trees: introduce preserve_ignored to unpack_trees_options
  read-tree, merge-recursive: overwrite ignored files by default
  checkout, read-tree: fix leak of unpack_trees_options.dir
  t2500: add various tests for nuking untracked files
2021-10-13 15:15:57 -07:00
Jonathan Tan 155b517d5c merge-{ort,recursive}: remove add_submodule_odb()
After the parent commit and some of its ancestors, the only place
commits are being accessed through alternates is in the user-facing
message formatting code. Fix those, and remove the add_submodule_odb()
calls.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-10-08 15:06:06 -07:00
Junio C Hamano 921c795c25 Merge branch 'jt/add-submodule-odb-clean-up'
More code paths that use the hack to add submodule's object
database to the set of alternate object store have been cleaned up.

* jt/add-submodule-odb-clean-up:
  revision: remove "submodule" from opt struct
  repository: support unabsorbed in repo_submodule_init
  submodule: remove unnecessary unabsorbed fallback
2021-10-06 13:40:11 -07:00
Junio C Hamano 45d141a1dd Merge branch 'en/typofixes'
Typofixes.

* en/typofixes:
  merge-ort: fix completely wrong comment
  trace2.h: fix trivial comment typo
2021-09-28 13:06:53 -07:00
Elijah Newren 1b5f37334a Remove ignored files by default when they are in the way
Change several commands to remove ignored files by default when they are
in the way.  Since some commands (checkout, merge) take a
--no-overwrite-ignore option to allow the user to configure this, and it
may make sense to add that option to more commands (and in the case of
merge, actually plumb that configuration option through to more of the
backends than just the fast-forwarding special case), add little
comments about where such flags would be used.

Incidentally, this fixes a test failure in t7112.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Elijah Newren 04988c8d18 unpack-trees: introduce preserve_ignored to unpack_trees_options
Currently, every caller of unpack_trees() that wants to ensure ignored
files are overwritten by default needs to:
   * allocate unpack_trees_options.dir
   * flip the DIR_SHOW_IGNORED flag in unpack_trees_options.dir->flags
   * call setup_standard_excludes
AND then after the call to unpack_trees() needs to
   * call dir_clear()
   * deallocate unpack_trees_options.dir
That's a fair amount of boilerplate, and every caller uses identical
code.  Make this easier by instead introducing a new boolean value where
the default value (0) does what we want so that new callers of
unpack_trees() automatically get the appropriate behavior.  And move all
the handling of unpack_trees_options.dir into unpack_trees() itself.

While preserve_ignored = 0 is the behavior we feel is the appropriate
default, we defer fixing commands to use the appropriate default until a
later commit.  So, this commit introduces several locations where we
manually set preserve_ignored=1.  This makes it clear where code paths
were previously preserving ignored files when they should not have been;
a future commit will flip these to instead use a value of 0 to get the
behavior we want.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-27 13:38:37 -07:00
Junio C Hamano 6295f87b5f Merge branch 'jt/add-submodule-odb-clean-up' into jt/no-abuse-alternate-odb-for-submodules
* jt/add-submodule-odb-clean-up:
  revision: remove "submodule" from opt struct
  repository: support unabsorbed in repo_submodule_init
  submodule: remove unnecessary unabsorbed fallback
2021-09-22 17:11:09 -07:00
Junio C Hamano a16dd13740 Merge branch 'ds/mergies-with-sparse-index'
Various mergy operations have been prepared to work efficiently
with the sparse index.

* ds/mergies-with-sparse-index:
  sparse-index: integrate with cherry-pick and rebase
  sequencer: ensure full index if not ORT strategy
  t1092: add cherry-pick, rebase tests
  merge-ort: expand only for out-of-cone conflicts
  merge: make sparse-aware with ORT
  diff: ignore sparse paths in diffstat
2021-09-20 15:20:45 -07:00
Elijah Newren 3584cff71c merge-ort: fix completely wrong comment
Not sure what happened, but the comment is describing code elsewhere in
the file.  Fix the comment to actually discuss the code that follows.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-20 11:25:02 -07:00
Derrick Stolee 6957636792 merge-ort: expand only for out-of-cone conflicts
Merge conflicts happen often enough to want to avoid expanding a sparse
index when they happen, as long as those conflicts are within the
sparse-checkout cone. If a conflict exists outside of the
sparse-checkout cone, then we still need to expand before iterating over
the index entries. This is critical to do in advance because of how the
original_cache_nr is tracked to allow inserting and replacing cache
entries.

Iterate over the conflicted files and check if any paths are outside of
the sparse-checkout cone. If so, then expand the full index.

Add a test that demonstrates that we do not expand the index, even when
we hit a conflict within the sparse-checkout cone.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 15:49:04 -07:00
Derrick Stolee a33806398a merge: make sparse-aware with ORT
Allow 'git merge' to operate without expanding a sparse index, at least
not immediately. The index still will be expanded in a few cases:

1. If the merge strategy is 'recursive', then we enable
   command_requires_full_index at the start of the merge_recursive()
   method. We expect sparse-index users to also have the 'ort' strategy
   enabled.

2. With the 'ort' strategy, if the merge results in a conflicted file,
   then we expand the index before updating the working tree. The loop
   that iterates over the worktree replaces index entries and tracks
   'origintal_cache_nr' which can become completely wrong if the index
   expands in the middle of the operation. This safety valve is
   important before that loop starts. A later change will focus this
   to only expand if we indeed have a conflict outside of the
   sparse-checkout cone.

3. Other merge strategies are executed as a 'git merge-X' subcommand,
   and those strategies are currently protected with the
   'command_requires_full_index' guard.

Some test updates are required, including a mistaken 'git checkout -b'
that did not specify the base branch, causing merges to be fast-forward
merges.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 15:49:04 -07:00
Jonathan Tan 10a0d6ae64 revision: remove "submodule" from opt struct
Clean up a TODO in revision.h by removing the "submodule" field from
struct setup_revision_opt. This field is only used to specify the ref
store to use, so use rev_info->repo to determine the ref store instead.

The only users of this field are merge-ort.c and merge-recursive.c.
However, both these files specify the superproject as rev_info->repo and
the submodule as setup_revision_opt->submodule. In order to be able to
pass the submodule as rev_info->repo, all commits must be parsed with
the submodule explicitly specified; this patch does that as well. (An
incremental solution in which only some commits are parsed with explicit
submodule will not work, because if the same commit is parsed twice in
different repositories, there will be 2 heap-allocated object structs
corresponding to that commit, and any flag set by the revision walking
mechanism on one of them will not be reflected onto the other.)

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-09-09 14:09:30 -07:00
Junio C Hamano 08ac213965 Merge branch 'en/ort-perf-batch-15'
Final batch for "merge -sort" optimization.

* en/ort-perf-batch-15:
  merge-ort: remove compile-time ability to turn off usage of memory pools
  merge-ort: reuse path strings in pool_alloc_filespec
  merge-ort: store filepairs and filespecs in our mem_pool
  diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
  merge-ort: switch our strmaps over to using memory pools
  merge-ort: set up a memory pool
  merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
  diffcore-rename: use a mem_pool for exact rename detection's hashmap
  merge-ort: rename str{map,intmap,set}_func()
2021-08-24 15:32:39 -07:00
Junio C Hamano 1a6fb019d6 Merge branch 'en/ort-perf-batch-14'
Further optimization on "merge -sort" backend.

* en/ort-perf-batch-14:
  merge-ort: restart merge with cached renames to reduce process entry cost
  merge-ort: avoid recursing into directories when we don't need to
  merge-ort: defer recursing into directories when merge base is matched
  merge-ort: add a handle_deferred_entries() helper function
  merge-ort: add data structures for allowable trivial directory resolves
  merge-ort: add some more explanations in collect_merge_info_callback()
  merge-ort: resolve paths early when we have sufficient information
2021-08-04 13:28:54 -07:00
Elijah Newren 62a15162fe merge-ort: remove compile-time ability to turn off usage of memory pools
Simplify code maintenance by removing the ability to toggle between
usage of memory pools and direct allocations.  This allows us to also
remove paths_to_free since it was solely about bookkeeping to make sure
we freed the necessary paths, and allows us to remove some auxiliary
functions.

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-08-03 15:13:38 -07:00
Elijah Newren 092e5115d1 merge-ort: reuse path strings in pool_alloc_filespec
pool_alloc_filespec() was written so that the code when pool != NULL
mimicked the code from alloc_filespec(), which including allocating
enough extra space for the path and then copying it.  However, the path
passed to pool_alloc_filespec() is always going to already be in the
same memory pool, so we may as well reuse it instead of copying it.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.5 ms ±  3.4 ms     198.3 ms ±  2.9 ms
    mega-renames:     679.1 ms ±  5.6 ms     661.8 ms ±  5.9 ms
    just-one-mega:    271.9 ms ±  2.8 ms     264.6 ms ±  2.5 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30 09:01:19 -07:00
Elijah Newren f239fff4c1 merge-ort: store filepairs and filespecs in our mem_pool
For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:       198.1 ms ±  2.6 ms     198.5 ms ±  3.4 ms
    mega-renames:     715.8 ms ±  4.0 ms     679.1 ms ±  5.6 ms
    just-one-mega:    276.8 ms ±  4.2 ms     271.9 ms ±  2.8 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30 09:01:19 -07:00
Elijah Newren a8791ef649 diffcore-rename, merge-ort: add wrapper functions for filepair alloc/dealloc
We want to be able to allocate filespecs and filepairs using a mem_pool.
However, filespec data will still remain outside the pool (perhaps in
the future we could plumb the pool through the various diff APIs to
allocate the filespec data too, but for now we are limiting the scope).
Add some extra functions to allocate these appropriately based on the
non-NULL-ness of opt->priv->pool, as well as some extra functions to
handle correctly deallocating the relevant parts of them.  A future
commit will make use of these new functions.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30 09:01:19 -07:00
Elijah Newren 6697ee01b5 merge-ort: switch our strmaps over to using memory pools
For all the strmaps (including strintmaps and strsets) whose memory is
unconditionally freed as part of clear_or_reinit_internal_opts(), switch
them over to using our new memory pool.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      202.5  ms ±  3.2  ms    198.1 ms ±  2.6 ms
    mega-renames:      1.072 s ±  0.012 s    715.8 ms ±  4.0 ms
    just-one-mega:   357.3  ms ±  3.9  ms    276.8 ms ±  4.2 ms

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30 09:01:19 -07:00
Elijah Newren 4137c54b90 merge-ort: set up a memory pool
merge-ort has a lot of data structures, and they all tend to be freed
together in clear_or_reinit_internal_opts().  Set up a memory pool to
allow us to make these allocations and deallocations faster.  Future
commits will adjust various callers to make use of this memory pool.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30 09:01:18 -07:00
Elijah Newren cdf2241c71 merge-ort: add pool_alloc, pool_calloc, and pool_strndup wrappers
Make the code more flexible so that it can handle both being run with or
without a memory pool by adding utility functions which will either call
    xmalloc, xcalloc, xstrndup
or
    mem_pool_alloc, mem_pool_calloc, mem_pool_strndup
depending on whether we have a non-NULL memory pool.  A subsequent
commit will make use of these.

(We will actually be dropping these functions soon and just assuming we
always have a memory pool, but the flexibility was very useful during
development of merge-ort so I want to be able to restore it if needed.)

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30 09:01:18 -07:00
Elijah Newren 7afc0b03a2 merge-ort: rename str{map,intmap,set}_func()
In order to make it clearer that these three variables holding a
function refer to functions that will clear the strmap/strintmap/strset,
rename them to str{map,intmap,set}_clear_func().

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-30 09:01:18 -07:00
Junio C Hamano 268055bfde Merge branch 'en/rename-limits-doc'
Documentation on "git diff -l<n>" and diff.renameLimit have been
updated, and the defaults for these limits have been raised.

* en/rename-limits-doc:
  rename: bump limit defaults yet again
  diffcore-rename: treat a rename_limit of 0 as unlimited
  doc: clarify documentation for rename/copy limits
  diff: correct warning message when renameLimit exceeded
2021-07-28 13:18:03 -07:00
Junio C Hamano dd6d3c90ee Merge branch 'ab/attribute-format'
Many "printf"-like helper functions we have have been annotated
with __attribute__() to catch placeholder/parameter mismatches.

* ab/attribute-format:
  advice.h: add missing __attribute__((format)) & fix usage
  *.h: add a few missing __attribute__((format))
  *.c static functions: add missing __attribute__((format))
  sequencer.c: move static function to avoid forward decl
  *.c static functions: don't forward-declare __attribute__
2021-07-28 13:17:59 -07:00
Elijah Newren 8b09a900a1 merge-ort: restart merge with cached renames to reduce process entry cost
The merge algorithm mostly consists of the following three functions:
   collect_merge_info()
   detect_and_process_renames()
   process_entries()
Prior to the trivial directory resolution optimization of the last half
dozen commits, process_entries() was consistently the slowest, followed
by collect_merge_info(), then detect_and_process_renames().  When the
trivial directory resolution applies, it often dramatically decreases
the amount of time spent in the two slower functions.

Looking at the performance results in the previous commit, the trivial
directory resolution optimization helps amazingly well when there are no
relevant renames.  It also helps really well when reapplying a long
series of linear commits (such as in a rebase or cherry-pick), since the
relevant renames may well be cached from the first reapplied commit.
But when there are any relevant renames that are not cached (represented
by the just-one-mega testcase), then the optimization does not help at
all.

Often, I noticed that when the optimization does not apply, it is
because there are a handful of relevant sources -- maybe even only one.
It felt frustrating to need to recurse into potentially hundreds or even
thousands of directories just for a single rename, but it was needed for
correctness.

However, staring at this list of functions and noticing that
process_entries() is the most expensive and knowing I could avoid it if
I had cached renames suggested a simple idea: change
   collect_merge_info()
   detect_and_process_renames()
   process_entries()
into
   collect_merge_info()
   detect_and_process_renames()
   <cache all the renames, and restart>
   collect_merge_info()
   detect_and_process_renames()
   process_entries()

This may seem odd and look like more work.  However, note that although
we run collect_merge_info() twice, the second time we get to employ
trivial directory resolves, which makes it much faster, so the increased
time in collect_merge_info() is small.  While we run
detect_and_process_renames() again, all renames are cached so it's
nearly a no-op (we don't call into diffcore_rename_extended() but we do
have a little bit of data structure checking and fixing up).  And the
big payoff comes from the fact that process_entries(), will be much
faster due to having far fewer entries to process.

This restarting only makes sense if we can save recursing into enough
directories to make it worth our while.  Introduce a simple heuristic to
guide this.  Note that this heuristic uses a "wanted_factor" that I have
virtually no actual real world data for, just some back-of-the-envelope
quasi-scientific calculations that I included in some comments and then
plucked a simple round number out of thin air.  It could be that
tweaking this number to make it either higher or lower improves the
optimization.  (There's slightly more here; when I first introduced this
optimization, I used a factor of 10, because I was completely confident
it was big enough to not cause slowdowns in special cases.  I was
certain it was higher than needed.  Several months later, I added the
rough calculations which make me think the optimal number is close to 2;
but instead of pushing to the limit, I just bumped it to 3 to reduce the
risk that there are special cases where this optimization can result in
slowing down the code a little.  If the ratio of path counts is below 3,
we probably will only see minor performance improvements at best
anyway.)

Also, note that while the diffstat looks kind of long (nearly 100
lines), more than half of it is in two comments explaining how things
work.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:      205.1  ms ±  3.8  ms   204.2  ms ±  3.0  ms
    mega-renames:      1.564 s ±  0.010 s     1.076 s ±  0.015 s
    just-one-mega:   479.5  ms ±  3.9  ms   364.1  ms ±  7.0  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 14:47:40 -07:00
Elijah Newren 7bee6c1004 merge-ort: avoid recursing into directories when we don't need to
This combines the work of the last several patches, and implements the
conditions when we don't need to recurse into directories.  It's perhaps
easiest to see the logic by separating the fact that a directory might
have both rename sources and rename destinations:

  * rename sources: only files present in the merge base can serve as
    rename sources, and only when one side deletes that file.  When the
    tree on one side matches the merge base, that means every file
    within the subtree matches the merge base.  This means that the
    skip-irrelevant-rename-detection optimization from before kicks in
    and we don't need any of these files as rename sources.

  * rename destinations: the tree that does not match the merge base
    might have newly added and hence unmatched destination files.
    This is what usually prevents us from doing trivial directory
    resolutions in the merge machinery.  However, the fact that we have
    deferred recursing into this directory until the end means we know
    whether there are any unmatched relevant potential rename sources
    elsewhere in this merge.  If there are no unmatched such relevant
    sources anywhere, then there is no need to look for unmatched
    potential rename destinations to match them with.

This informs our algorithm:
  * Search through relevant_sources; if we have entries, they better all
    be reflected in cached_pairs or cached_irrelevant, otherwise they
    represent an unmatched potential rename source (causing the
    optimization to be disallowed).
  * For any relevant_source represented in cached_pairs, we do need to
    to make sure to get the destination for each source, meaning we need
    to recurse into any ancestor directories of those destinations.
  * Once we've recursed into all the rename destinations for any
    relevant_sources in cached_pairs, we can then do the trivial
    directory resolution for the remaining directories.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:        5.235 s ±  0.042 s   205.1  ms ±  3.8  ms
    mega-renames:      9.419 s ±  0.107 s     1.564 s ±  0.010 s
    just-one-mega:   480.1  ms ±  3.9  ms   479.5  ms ±  3.9  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 14:47:40 -07:00
Elijah Newren 5e1ca57a7b merge-ort: defer recursing into directories when merge base is matched
When one side of history matches the merge base (including when the
merge base has no entry for the given directory), have
collect_merge_info_callback() defer recursing into the directory.  To
ensure those entries are eventually handled, add a call to
handled_deferred_entries() in collect_merge_info() after
traverse_trees() returns.

Note that the condition in collect_merge_info_callback() may look more
complicated than necessary at first glance;
renames->trivial_merges_okay[side] is always true until
handle_deferred_entries() is called, and possible_trivial_merges[side]
is always empty right now (and in the future won't be filled until
handle_deferred_entries() is called).  However, when
handle_deferred_entries() calls traverse_trees() for the relevant
deferred directories, those traverse_trees() calls will once again end
up in collect_merge_info_callback() for all the entries under those
subdirectories.  The extra conditions are there for such deferred cases
and will be used more as we do more with those variables.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 14:47:39 -07:00
Elijah Newren e0ef578eae merge-ort: add a handle_deferred_entries() helper function
In order to allow trivial directory resolution, we first need to be able
to gather more information to determine if the optimization is safe.  To
enable that, we need a way of deferring the recursion into the directory
until a later time.  Naturally, deferring the entry into a subtree means
that we need some function that will later recurse into the subdirectory
exactly the same way that collect_merge_info_callback() would have done.

Add a helper function that does this.  For now this function is not used
but a subsequent commit will change that.  Future commits will also make
the function sometimes resolve directories instead of traversing inside.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 14:47:39 -07:00
Elijah Newren d478f56759 merge-ort: add data structures for allowable trivial directory resolves
As noted a few commits ago, we can resolve individual files early if all
three sides of the merge have a file at the path and two of the three
sides match.  We would really like to do the same thing with
directories, because being able to do a trivial directory resolve means
we don't have to recurse into the directory, potentially saving us a
huge amount of time in both collect_merge_info() and process_entries().
Unfortunately, resolving directories early would mean missing any
renames whose source or destination is underneath that directory.

If we somehow knew there weren't any renames under the directory in
question, then we could resolve it early.  Sadly, it is impossible to
determine whether there are renames under the directory in question
without recursing into it, and this has traditionally kept us from ever
implementing such an optimization.

In commit f89b4f2bee ("merge-ort: skip rename detection entirely if
possible", 2021-03-11), we added an additional reason that rename
detection could be skipped entirely -- namely, if no *relevant* sources
were present.  Without completing collect_merge_info_callback(), we do
not yet know if there are no relevant sources.  However, we do know that
if the current directory on one side matches the merge base, then every
source file within that directory will not be RELEVANT_CONTENT, and a
few simple checks can often let us rule out RELEVANT_LOCATION as well.
This suggests we can just defer recursing into such directories until
the end of collect_merge_info.

Since the deferred directories are known to not add any relevant sources
due to the above properties, then if there are no relevant sources after
we've traversed all paths other than the deferred ones, then we know
there are not any relevant sources.  Under those conditions, rename
detection is unnecessary, and that means we can resolve the deferred
directories without recursing into them.

Note that the logic for skipping rename detection was also modified
further in commit 76e253793c ("merge-ort, diffcore-rename: employ cached
renames when possible", 2021-01-30); in particular rename detection can
be skipped if we already have cached renames for each relevant source.
We can take advantage of this information as well with our deferral of
recursing into directories where one side matches the merge base.

Add some data structures that we will use to do these deferrals, with
some lengthy comments explaining their purpose.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 14:47:39 -07:00
Elijah Newren 528fc51b6d merge-ort: add some more explanations in collect_merge_info_callback()
The previous patch possibly raises some questions about whether
additional cases in collect_merge_info_callback() can be handled early.
Add some explanations in the form of comments to help explain these
better.  While we're at it, add a few comments to denote what a few
boolean '0' or '1' values stand for.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 14:47:39 -07:00
Elijah Newren 785bf2088e merge-ort: resolve paths early when we have sufficient information
When there are no directories involved at a given path, and all three
sides have a file at that path, and two of the three sides of history
match, we can immediately resolve the merge of that path in
collect_merge_info() and do not need to wait until process_entries().

This is actually a very minor improvement: half the time when I run it,
I see an improvement; the other half a slowdown.  It seems to be in the
range of noise.  However, this idea serves as the beginning of some
bigger optimizations coming in the following patches.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-20 14:47:39 -07:00
Junio C Hamano bd4232fac3 Merge branch 'ab/struct-init'
Code cleanup around struct_type_init() functions.

* ab/struct-init:
  string-list.h users: change to use *_{nodup,dup}()
  string-list.[ch]: add a string_list_init_{nodup,dup}()
  dir.[ch]: replace dir_init() with DIR_INIT
  *.c *_init(): define in terms of corresponding *_INIT macro
  *.h: move some *_INIT to designated initializers
2021-07-16 17:42:53 -07:00
Junio C Hamano d3b88be1b4 Merge branch 'en/merge-dir-rename-corner-case-fix'
The merge code had funny interactions between content based rename
detection and directory rename detection.

* en/merge-dir-rename-corner-case-fix:
  merge-recursive: handle rename-to-self case
  merge-ort: ensure we consult df_conflict and path_conflicts
  t6423: test directory renames causing rename-to-self
2021-07-16 17:42:45 -07:00
Junio C Hamano fdbcdfcf61 Merge branch 'en/ort-perf-batch-13'
Performance tweaks of "git merge -sort" around lazy fetching of objects.

* en/ort-perf-batch-13:
  merge-ort: add prefetching for content merges
  diffcore-rename: use a different prefetch for basename comparisons
  diffcore-rename: allow different missing_object_cb functions
  t6421: add tests checking for excessive object downloads during merge
  promisor-remote: output trace2 statistics for number of objects fetched
2021-07-16 17:42:45 -07:00
Junio C Hamano 89efac81c7 Merge branch 'en/ort-perf-batch-12'
More fix-ups and optimization to "merge -sort".

* en/ort-perf-batch-12:
  merge-ort: miscellaneous touch-ups
  Fix various issues found in comments
  diffcore-rename: avoid unnecessary strdup'ing in break_idx
  merge-ort: replace string_list_df_name_compare with faster alternative
2021-07-16 17:42:45 -07:00
Elijah Newren 94b82d5686 rename: bump limit defaults yet again
These were last bumped in commit 92c57e5c1d (bump rename limit
defaults (again), 2011-02-19), and were bumped both because processors
had gotten faster, and because people were getting ugly merges that
caused problems and reporting it to the mailing list (suggesting that
folks were willing to spend more time waiting).

Since that time:
  * Linus has continued recommending kernel folks to set
    diff.renameLimit=0 (maps to 32767, currently)
  * Folks with repositories with lots of renames were happy to set
    merge.renameLimit above 32767, once the code supported that, to
    get correct cherry-picks
  * Processors have gotten faster
  * It has been discovered that the timing methodology used last time
    probably used too large example files.

The last point is probably worth explaining a bit more:

  * The "average" file size used appears to have been average blob size
    in the linux kernel history at the time (probably v2.6.25 or
    something close to it).
  * Since bigger files are modified more frequently, such a computation
    weights towards larger files.
  * Larger files may be more likely to be modified over time, but are
    not more likely to be renamed -- the mean and median blob size
    within a tree are a bit higher than the mean and median of blob
    sizes in the history leading up to that version for the linux
    kernel.
  * The mean blob size in v2.6.25 was half the average blob size in
    history leading to that point
  * The median blob size in v2.6.25 was about 40% of the mean blob size
    in v2.6.25.
  * Since the mean blob size is more than double the median blob size,
    any file as big as the mean will not be compared to any files of
    median size or less (because they'd be more than 50% dissimilar).
  * Since it is the number of files compared that provides the O(n^2)
    behavior, median-sized files should matter more than mean-sized
    ones.

The combined effect of the above is that the file size used in past
calculations was likely about 5x too large.  Combine that with a CPU
performance improvement of ~30%, and we can increase the limits by
a factor of sqrt(5/(1-.3)) = 2.67, while keeping the original stated
time limits.

Keeping the same approximate time limit probably makes sense for
diff.renameLimit (there is no progress feedback in e.g. git log -p),
but the experience above suggests merge.renameLimit could be extended
significantly.  In fact, it probably would make sense to have an
unlimited default setting for merge.renameLimit, but that would
likely need to be coupled with changes to how progress is displayed.
(See https://lore.kernel.org/git/YOx+Ok%2FEYvLqRMzJ@coredump.intra.peff.net/
for details in that area.)  For now, let's just bump the approximate
time limit from 10s to 1m.

(Note: We do not want to use actual time limits, because getting results
that depend on how loaded your system is that day feels bad, and because
we don't discover that we won't get all the renames until after we've
put in a lot of work rather than just upfront telling the user there are
too many files involved.)

Using the original time limit of 2s for diff.renameLimit, and bumping
merge.renameLimit from 10s to 60s, I found the following timings using
the simple script at the end of this commit message (on an AWS c5.xlarge
which reports as "Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz"):

      N   Timing
   1300    1.995s
   7100   59.973s

So let's round down to nice even numbers and bump the limits from
400->1000, and from 1000->7000.

Here is the measure_rename_perf script (adapted from
https://lore.kernel.org/git/20080211113516.GB6344@coredump.intra.peff.net/
in particular to avoid triggering the linear handling from
basename-guided rename detection):

    #!/bin/bash

    n=$1; shift

    rm -rf repo
    mkdir repo && cd repo
    git init -q -b main

    mkdata() {
      mkdir $1
      for i in `seq 1 $2`; do
        (sed "s/^/$i /" <../sample
         echo tag: $1
        ) >$1/$i
      done
    }

    mkdata initial $n
    git add .
    git commit -q -m initial

    mkdata new $n
    git add .
    cd new
    for i in *; do git mv $i $i.renamed; done
    cd ..
    git rm -q -rf initial
    git commit -q -m new

    time git diff-tree -M -l0 --summary HEAD^ HEAD

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-15 16:54:34 -07:00
Ævar Arnfjörð Bjarmason 48ca53cac4 *.c static functions: add missing __attribute__((format))
Add missing __attribute__((format)) function attributes to various
"static" functions that take printf arguments.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-13 15:20:20 -07:00
Ævar Arnfjörð Bjarmason bc40dfb10a string-list.h users: change to use *_{nodup,dup}()
Change all in-tree users of the string_list_init(LIST, BOOL) API to
use string_list_init_{nodup,dup}(LIST) instead.

As noted in the preceding commit let's leave the now-unused
string_list_init() wrapper in-place for any in-flight users, it can be
removed at some later date.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-07-01 12:32:22 -07:00
Elijah Newren a492d5331c merge-ort: ensure we consult df_conflict and path_conflicts
Path conflicts (typically rename path conflicts, e.g.
rename/rename(1to2) or rename/add/delete), and directory/file conflicts
should obviously result in files not being marked as clean in the merge.
We had a codepath where we missed consulting the path_conflict and
df_conflict flags, based on match_mask.  Granted, it requires an unusual
setup to trigger this codepath (directory rename causing rename-to-self
is the only case I can think of), but we still need to handle it.  To
make it clear that we have audited the other codepaths that do not
explicitly mention these flags, add some assertions that the flags are
not set.

Reported-by: Anders Kaseorg <andersk@mit.edu>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-30 14:40:10 -07:00
Elijah Newren 2bff554b23 merge-ort: add prefetching for content merges
Commit 7fbbcb21b1 ("diff: batch fetching of missing blobs", 2019-04-05)
introduced batching of fetching missing blobs, so that the diff
machinery would have one fetch subprocess grab N blobs instead of N
processes each grabbing 1.

However, the diff machinery is not the only thing in a merge that needs
to work on blobs.  The 3-way content merges need them as well.  Rather
than download all the blobs 1 at a time, prefetch all the blobs needed
for regular content merges.

This does not cover all possible paths in merge-ort that might need to
download blobs.  Others include:
  - The blob_unchanged() calls to avoid modify/delete conflicts (when
    blob renormalization results in an "unchanged" file)
  - Preliminary content merges needed for rename/add and
    rename/rename(2to1) style conflicts.  (Both of these types of
    conflicts can result in nested conflict markers from the need to do
    two levels of content merging; the first happens before our new
    prefetch_for_content_merges() function.)

The first of these wouldn't be an extreme amount of work to support, and
even the second could be theoretically supported in batching, but all of
these cases seem unusual to me, and this is a minor performance
optimization anyway; in the worst case we only get some of the fetches
batched and have a few additional one-off fetches.  So for now, just
handle the regular 3-way content merges in our prefetching.

For the testcase from the previous commit, the number of downloaded
objects remains at 63, but this drops the number of fetches needed from
32 down to 20, a sizeable reduction.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-28 07:58:25 -07:00
Junio C Hamano 169914ede2 Merge branch 'en/ort-perf-batch-11'
Optimize out repeated rename detection in a sequence of mergy
operations.

* en/ort-perf-batch-11:
  merge-ort, diffcore-rename: employ cached renames when possible
  merge-ort: handle interactions of caching and rename/rename(1to1) cases
  merge-ort: add helper functions for using cached renames
  merge-ort: preserve cached renames for the appropriate side
  merge-ort: avoid accidental API mis-use
  merge-ort: add code to check for whether cached renames can be reused
  merge-ort: populate caches of rename detection results
  merge-ort: add data structures for in-memory caching of rename detection
  t6429: testcases for remembering renames
  fast-rebase: write conflict state to working tree, index, and HEAD
  fast-rebase: change assert() to BUG()
  Documentation/technical: describe remembering renames optimization
  t6423: rename file within directory that other side renamed
2021-06-14 13:33:27 +09:00
Elijah Newren ef68c3d800 merge-ort: miscellaneous touch-ups
Add some notes in the code about invariants with match_mask when adding
pairs.  Also add a comment that seems to have been left out in my work
of pushing these changes upstream.

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-09 11:40:04 +09:00
Elijah Newren 356da0f98b Fix various issues found in comments
A random hodge-podge of incorrect or out-of-date comments that I found:

  * t6423 had a comment that has referred to the wrong test for years;
    fix it to refer to the right one.
  * diffcore-rename had a FIXME comment meant to remind myself to
    investigate if I could make another code change.  I later
    investigated and removed the FIXME, but while cherry-picking the
    patch to submit upstream I missed the later update.  Remove the
    comment now.
  * merge-ort had the early part of a comment for a function; I had
    meant to include the more involved description when I updated the
    function.  Update the comment now.

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-09 11:40:04 +09:00
Elijah Newren 5a3743da32 merge-ort: replace string_list_df_name_compare with faster alternative
Gathering accumulated times from trace2 output on the mega-renames
testcase, I saw the following timings (where I'm only showing a few
lines to highlight the portions of interest):

    10.120 : label:incore_nonrecursive
        4.462 : ..label:process_entries
           3.143 : ....label:process_entries setup
              2.988 : ......label:plist special sort
           1.305 : ....label:processing
        2.604 : ..label:collect_merge_info
        2.018 : ..label:merge_start
        1.018 : ..label:renames

In the above output, note that the 4.462 seconds for process_entries was
split as 3.143 seconds for "process_entries setup" and 1.305 seconds for
"processing" (and a little time for other stuff removed from the
highlight).  Most of the "process_entries setup" time was spent on
"plist special sort" which corresponds to the following code:

    trace2_region_enter("merge", "plist special sort", opt->repo);
    plist.cmp = string_list_df_name_compare;
    string_list_sort(&plist);
    trace2_region_leave("merge", "plist special sort", opt->repo);

In other words, in a merge strategy that would be invoked by passing
"-sort" to either rebase or merge, sorting an array takes more time than
anything else.  Serves me right for naming my merge strategy this way.

Rewrite the comparison function in a way that does not require finding
out the lengths of the strings when comparing them.  While at it, tweak
the code for our specific case -- no need to handle a variety of modes,
for example.  The combination of these changes reduced the time spent in
"plist special sort" by ~25% in the mega-renames case.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:        5.622 s ±  0.059 s     5.235 s ±  0.042 s
    mega-renames:     10.127 s ±  0.073 s     9.419 s ±  0.107 s
    just-one-mega:   500.3  ms ±  3.8  ms   480.1  ms ±  3.9  ms

Signed-off-by: Elijah Newren <newren@gmail.com>
Reviewed-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-06-09 11:40:03 +09:00
Elijah Newren 25e65b6dd5 merge-ort, diffcore-rename: employ cached renames when possible
When there are many renames between the old base of a series of commits
and the new base, the way sequencer.c, merge-recursive.c, and
diffcore-rename.c have traditionally split the work resulted in
redetecting the same renames with each and every commit being
transplanted.  To address this, the last several commits have been
creating a cache of rename detection results, determining when it was
safe to use such a cache in subsequent merge operations, adding helper
functions, and so on.  See the previous half dozen commit messages for
additional discussion of this optimization, particularly the message a
few commits ago entitled "add code to check for whether cached renames
can be reused".  This commit finally ties all of that work together,
modifying the merge algorithm to make use of these cached renames.

For the testcases mentioned in commit 557ac0350d ("merge-ort: begin
performance work; instrument with trace2_region_* calls", 2020-10-28),
this change improves the performance as follows:

                            Before                  After
    no-renames:        5.665 s ±  0.129 s     5.622 s ±  0.059 s
    mega-renames:     11.435 s ±  0.158 s    10.127 s ±  0.073 s
    just-one-mega:   494.2  ms ±  6.1  ms   500.3  ms ±  3.8  ms

That's a fairly small improvement, but mostly because the previous
optimizations were so effective for these particular testcases; this
optimization only kicks in when the others don't.  If we undid the
basename-guided rename detection and skip-irrelevant-renames
optimizations, then we'd see that this series by itself improved
performance as follows:

                   Before Basename Series   After Just This Series
    no-renames:      13.815 s ±  0.062 s      5.697 s ±  0.080 s
    mega-renames:  1799.937 s ±  0.493 s    205.709 s ±  0.457 s

Since this optimization kicks in to help accelerate cases where the
previous optimizations do not apply, this last comparison shows that
this cached-renames optimization has the potential to help signficantly
in cases that don't meet the requirements for the other optimizations to
be effective.

The changes made in this optimization also lay some important groundwork
for a future optimization around having collect_merge_info() avoid
recursing into subtrees in more cases.

However, for this optimization to be effective, merge_switch_to_result()
should only be called when the rebase or cherry-pick operation has
either completed or hit a case where the user needs to resolve a
conflict or edit the result.  If it is called after every commit, as
sequencer.c does, then the working tree and index are needlessly updated
with every commit and the cached metadata is tossed, defeating this
optimization.  Refactoring sequencer.c to only call
merge_switch_to_result() at the end of the operation is a bigger
undertaking, and the practical benefits of this optimization will not be
realized until that work is performed.  Since `test-tool fast-rebase`
only updates at the end of the operation, it was used to obtain the
timings above.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-05-20 15:40:39 +09:00