mirror/git - git - git.dotya.ml

mirror/ git

mirror of https://github.com/git/git.git synced 2024-05-11 12:26:08 +02:00

Author	SHA1	Message	Date
Junio C Hamano	88cc8ed8bc	Merge branch 'en/header-cleanup' Code clean-up to clarify the rule that "git-compat-util.h" must be the first to be included. * en/header-cleanup: diff.h: remove unnecessary include of object.h Remove unnecessary includes of builtin.h treewide: replace cache.h with more direct headers, where possible replace-object.h: move read_replace_refs declaration from cache.h to here object-store.h: move struct object_info from cache.h dir.h: refactor to no longer need to include cache.h object.h: stop depending on cache.h; make cache.h depend on object.h ident.h: move ident-related declarations out of cache.h pretty.h: move has_non_ascii() declaration from commit.h cache.h: remove dependence on hex.h; make other files include it explicitly hex.h: move some hex-related declarations from cache.h hash.h: move some oid-related declarations from cache.h alloc.h: move ALLOC_GROW() functions from cache.h treewide: remove unnecessary cache.h includes in source files treewide: remove unnecessary cache.h includes treewide: remove unnecessary git-compat-util.h includes in headers treewide: ensure one of the appropriate headers is sourced first	2023-03-17 14:03:09 -07:00
Jeff King	8d3e7eac52	fsck: check even zero-entry index files In `fb64ca526a` (fsck: check index files in all worktrees, 2023-02-24), we swapped out a call to vanilla repo_read_index() for a series of read_index_from() calls, one per worktree. The code for the latter was copied from add_index_objects_to_pending(), which checks for a positive return value from the index reading function, and we do the same here in fsck now. But this is probably the wrong thing. I had interpreted the check as "don't operate on the index struct if there was an error". But in reality, if there is an error then the index-reading code will simply die (which admittedly is not great for fsck, but that is not a new problem). The return value here is actually the number of entries read. So it makes sense for add_index_objects_to_pending() to ignore a zero-entry index (there is nothing to add). But for fsck, we would still want to check any extensions, etc (though presumably it is unlikely to have them in an empty index, I don't think it's impossible). So we should ignore the return value from read_index_from() entirely. This matches the behavior before `fb64ca526a`, when we ignored the return value from repo_read_index(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-27 07:36:36 -08:00
Jeff King	592ec63b38	fsck: mention file path for index errors If we encounter an error in an index file, we may say something like: error: 1234abcd: invalid sha1 pointer in resolve-undo But if you have multiple worktrees, each with its own index, it can be very helpful to know which file had the problem. So let's pass that path down through the various index-fsck functions and use it where appropriate. After this patch you should get something like: error: 1234abcd: invalid sha1 pointer in resolve-undo of .git/worktrees/wt/index That's a bit verbose, but since the point is that you shouldn't see this normally, we're better to err on the side of more details. I've also added the index filename to the name used by "fsck --name-objects", which will show up if we find the object to be missing, etc. This is bending the rules a little there, as the option claims to write names that can be fed to rev-parse. But there is no revision syntax to access the index of another worktree, so the best we can do is make up something that a human will probably understand. I did take care to retain the existing ":file" syntax for the current worktree. So the uglier output should kick in only when it's actually necessary. See the included tests for examples of both forms. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-24 09:32:23 -08:00
Jeff King	fb64ca526a	fsck: check index files in all worktrees We check the index file for the main worktree, but completely ignore the index files in other worktrees. These should be checked, too, as they are part of the repository state (and in particular, errors in those index files may cause repo-wide operations like "git gc" to complain). Reported-by: Johannes Sixt <j6t@kdbg.org> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-24 09:32:23 -08:00
Jeff King	8840069a37	fsck: factor out index fsck The code to fsck an index operates directly on the_index. Let's move it into its own function in preparation for handling the index files from other worktrees. Since we now have only a single reference to the_index, let's drop our USE_THE_INDEX_VARIABLE definition and just use the_repository.index directly. That's a minor cleanup, but also ensures that we didn't miss any references when moving the code into fsck_index(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-24 09:30:58 -08:00
Elijah Newren	cbeab74713	replace-object.h: move read_replace_refs declaration from cache.h to here Adjust several files to be more explicit about their dependency on replace-objects to accommodate this change. Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:30 -08:00
Elijah Newren	41771fa435	cache.h: remove dependence on hex.h; make other files include it explicitly Signed-off-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2023-02-23 17:25:29 -08:00
Ævar Arnfjörð Bjarmason	07047d6829	cocci: apply "pending" index-compatibility to some "builtin/.c" Apply "index-compatibility.pending.cocci" rule to "builtin/", but exclude those where we conflict with in-flight changes. As a result some of them end up using only "the_index", so let's have them use the more narrow "USE_THE_INDEX_VARIABLE" rather than "USE_THE_INDEX_COMPATIBILITY_MACROS". Manual changes not made by coccinelle, that were squashed in: * Whitespace-wrap argument lists for repo_hold_locked_index(), repo_read_index_preload() and repo_refresh_and_write_index(), in cases where the line became too long after the transformation. * Change "refresh_cache()" to "refresh_index()" in a comment in "builtin/update-index.c". * For those whose call was followed by perror("<macro-name>"), change it to perror("<function-name>"), referring to the new function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-11-21 12:06:15 +09:00
Ævar Arnfjörð Bjarmason	dc594180d9	cocci & cache.h: apply variable section of "pending" index-compatibility Mostly apply the part of "index-compatibility.pending.cocci" that renames the global variables like "active_nr", which are a shorthand to referencing (in that case) a struct member as "the_index.cache_nr". In doing so move more of "index-compatibility.pending.cocci" to "index-compatibility.cocci". In the case of "active_nr" we'd have a textual conflict with "ab/various-leak-fixes" in "next"[1]. Let's exclude that specific case while moving the rule over from "pending". 1. `407b94280f` (commit: discard partial cache before (re-)reading it, 2022-11-08) Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-11-21 12:06:15 +09:00
Junio C Hamano	7b9b634ca5	Merge branch 'ab/doc-synopsis-and-cmd-usage' The short-help text shown by "git cmd -h" and the synopsis text shown at the beginning of "git help cmd" have been made more consistent. * ab/doc-synopsis-and-cmd-usage: (34 commits) tests: assert consistent whitespace in -h output tests: start asserting that *.txt SYNOPSIS matches -h output doc txt & -h consistency: make "worktree" consistent worktree: define subcommand -h in terms of command -h reflog doc: list real subcommands up-front doc txt & -h consistency: make "commit" consistent doc txt & -h consistency: make "diff-tree" consistent doc txt & -h consistency: use "[<label>...]" for "zero or more" doc txt & -h consistency: make "annotate" consistent doc txt & -h consistency: make "stash" consistent doc txt & -h consistency: add missing options doc txt & -h consistency: use "git foo" form, not "git-foo" doc txt & -h consistency: make "bundle" consistent doc txt & -h consistency: make "read-tree" consistent doc txt & -h consistency: make "rerere" consistent doc txt & -h consistency: add missing options and labels doc txt & -h consistency: make output order consistent doc txt & -h consistency: add or fix optional "--" syntax doc txt & -h consistency: fix mismatching labels doc SYNOPSIS & -h: use "-" to separate words in labels, not "_" ...	2022-10-28 11:26:54 -07:00
Ævar Arnfjörð Bjarmason	d9054a19ed	doc txt & -h consistency: add missing options Change those built-in commands that were attempting to exhaustively list the options in the "-h" output to actually do so, and always have .txt documentation know about the exhaustive list of options. Let's also fix the documentation and -h output for those built-in commands where the .txt and -h output was a mismatch of missing options on both sides. In the case of "interpret-trailers" fixing the missing options reveals that the *.txt version was implicitly claiming that the command had two operating modes, which a look at the -h version (and studying the documentation) will show is not the case. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-10-13 09:32:57 -07:00
Junio C Hamano	fdbfac60fd	Merge branch 'jk/fsck-on-diet' "git fsck" failed to release contents of tree objects already used from the memory, which has been fixed. * jk/fsck-on-diet: parse_object_buffer(): respect save_commit_buffer fsck: turn off save_commit_buffer fsck: free tree buffers after walking unreachable objects	2022-10-10 10:08:39 -07:00
Jeff King	51b27747e5	parse_object_buffer(): respect save_commit_buffer If the global variable "save_commit_buffer" is set to 0, then parse_commit() will throw away the commit object data after parsing it, rather than sticking it into a commit slab. This goes all the way back to `60ab26de99` ([PATCH] Avoid wasting memory in git-rev-list, 2005-09-15). But there's another code path which may similarly stash the buffer: parse_object_buffer(). This is where we end up if we parse a commit via parse_object(), and it's used directly in a few other code paths like git-fsck. The original goal of `60ab26de99` was avoiding extra memory usage for rev-list. And there it's not all that important to catch parse_object(). We use that function only for looking at the tips of the traversal, and the majority of the commits are parsed by following parent links, where we use parse_commit() directly. So we were wasting some memory, but only a small portion. It's much easier to see the effect with fsck. Since we now turn off save_commit_buffer by default there, we _should_ be able to drop the freeing of the commit buffer in fsck_obj(). But if we do so (taking the first hunk of this patch without the rest), then the peak heap of "git fsck" in a clone of git.git goes from 136MB to 194MB. Teaching parse_object_buffer() to respect save_commit_buffer brings that down to 134.5MB (it's hard to tell from massif's output, but I suspect the savings comes from avoiding the overhead of the mostly-empty commit slab). Other programs should see a small improvement. Both "rev-list --all" and "fsck --connectivity-only" improve by a few hundred kilobytes, as they'd avoid loading the tip objects of their traversals. Most importantly, no code should be hurt by doing this. Any program that turns off save_commit_buffer is already making the assumption that any commit it sees may need to have its object data loaded on demand, as it doesn't know which ones were parsed by parse_commit() versus parse_object(). Not to mention that anything parsed by the commit graph may be in the same boat, even if save_commit_buffer was not disabled. This should be the only spot that needs to be fixed. Grepping for set_commit_buffer() shows that this and parse_commit() are the only relevant calls. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-22 11:40:47 -07:00
Jeff King	069e445256	fsck: turn off save_commit_buffer When parsing a commit, the default behavior is to stuff the original buffer into a commit_slab (which takes ownership of it). But for a tool like fsck, this isn't useful. While we may look at the buffer further as part of fsck_commit(), we'll always do so through a separate pointer; attaching the buffer to the slab doesn't help. Worse, it means we have to remember to free the commit buffer in all call paths. We do so in fsck_obj(), which covers a regular "git fsck". But with "--connectivity-only", we forget to do so in both traverse_one_object(), which covers reachable objects, and mark_unreachable_referents(), which covers unreachable ones. As a result, that mode ends up storing an uncompressed copy of every commit on the heap at once. We could teach the code paths for --connectivity-only to also free commit buffers. But there's an even easier fix: we can just turn off the save_commit_buffer flag, and then we won't attach them to the commits in the first place. This reduces the peak heap of running "git fsck --connectivity-only" in a clone of linux.git from ~2GB to ~1GB. According to massif, the remaining memory goes where you'd expect: the object structs themselves, the obj_hash containing them, and the delta base cache. Note that we'll leave the call to free commit buffers in fsck_obj() for now; it's not quite redundant because of a related bug that we'll fix in a subsequent commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-22 11:40:11 -07:00
Jeff King	fbce4fa9ae	fsck: free tree buffers after walking unreachable objects After calling fsck_walk(), a tree object struct may be left in the parsed state, with the full tree contents available via tree->buffer. It's the responsibility of the caller to free these when it's done with the object to avoid having many trees allocated at once. In a regular "git fsck", we hit fsck_walk() only from fsck_obj(), which does call free_tree_buffer(). Likewise for "--connectivity-only", we see most objects via traverse_one_object(), which makes a similar call. The exception is in mark_unreachable_referents(). When using both "--connectivity-only" and "--dangling" (the latter of which is the default), we walk all of the unreachable objects, and there we forget to free. Most cases would not notice this, because they don't have a lot of unreachable objects, but you can make a pathological case like this: git clone --bare /path/to/linux.git repo.git cd repo.git rm packed-refs ;# now everything is unreachable! git fsck --connectivity-only That ends up with peak heap usage ~18GB, which is (not coincidentally) close to the size of all uncompressed trees in the repository. After this patch, the peak heap is only ~2GB. A few things to note: - it might seem like fsck_walk(), if it is parsing the trees, should be responsible for freeing them. But the situation is quite tricky. In the non-connectivity mode, after we call fsck_walk() we then proceed with fsck_object() which actually does the type-specific sanity checks on the object contents. We do pass our own separate buffer to fsck_object(), but there's a catch: our earlier call to parse_object_buffer() may have attached that buffer to the object struct! So by freeing it, we leave the rest of the code with a dangling pointer. Likewise, the call to fsck_walk() in index-pack is subtle. It attaches a buffer to the tree object that must not be freed! And so rather than calling free_tree_buffer(), it actually detaches it by setting tree->buffer to NULL. These cases would _probably_ be fixable by having fsck_walk() free the tree buffer only when it was the one who allocated it via parse_tree(). But that would still leave the callers responsible for freeing other cases, so they wouldn't be simplified. While the current semantics for fsck_walk() make it easy to accidentally leak in new callers, at least they are simple to explain, and it's not a function that's likely to get a lot of new call-sites. And in any case, it's probably sensible to fix the leak first with this simple patch, and try any more complicated refactoring separately. - a careful reader may notice that fsck_obj() also frees commit buffers, but neither the call in traverse_one_object() nor the one touched in this patch does so. And indeed, this is another problem for --connectivity-only (and accounts for most of the 2GB heap after this patch), but it's one we'll fix in a separate commit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-22 11:30:06 -07:00
Ævar Arnfjörð Bjarmason	5cf88fd8b0	git-compat-util.h: use "UNUSED", not "UNUSED(var)" As reported in [1] the "UNUSED(var)" macro introduced in `2174b8c75d` (Merge branch 'jk/unused-annotation' into next, 2022-08-24) breaks coccinelle's parsing of our sources in files where it occurs. Let's instead partially go with the approach suggested in [2] of making this not take an argument. As noted in [1] "coccinelle" will ignore such tokens in argument lists that it doesn't know about, and it's less of a surprise to syntax highlighters. This undoes the "help us notice when a parameter marked as unused is actually use" part of `9b24034754` (git-compat-util: add UNUSED macro, 2022-08-19), a subsequent commit will further tweak the macro to implement a replacement for that functionality. 1. https://lore.kernel.org/git/220825.86ilmg4mil.gmgdl@evledraar.gmail.com/ 2. https://lore.kernel.org/git/220819.868rnk54ju.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-09-01 10:49:48 -07:00
Jeff King	c006e9fa59	refs: mark unused reflog callback parameters Functions used with for_each_reflog_ent() need to conform to a particular interface, but not every function needs all of the parameters. Mark the unused ones to make -Wunused-parameter happy. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 12:18:54 -07:00
Jeff King	63e14ee2d6	refs: mark unused each_ref_fn parameters Functions used with for_each_ref(), etc, need to conform to the each_ref_fn interface. But most of them don't need every parameter; let's annotate the unused ones to quiet -Wunused-parameter. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-08-19 12:18:54 -07:00
Junio C Hamano	e0ad13977a	fsck: do not dereference NULL while checking resolve-undo data When we found an invalid object recorded in the resolve-undo data, we would have ended up dereferencing NULL while fsck. Reporting the problem and going on to the next object is the right thing to do here. Noticed by SZEDER Gábor. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-07-11 16:26:33 -07:00
Junio C Hamano	5a5ea141e7	revision: mark blobs needed for resolve-undo as reachable The resolve-undo extension was added to the index in `cfc5789a` (resolve-undo: record resolved conflicts in a new index extension section, 2009-12-25). This extension records the blob object names and their modes of conflicted paths when the path gets resolved (e.g. with "git add"), to allow "undoing" the resolution with "checkout -m path". These blob objects should be guarded from garbage-collection while we have the resolve-undo information in the index (otherwise unresolve operation may try to use a blob object that has already been pruned away). But the code called from mark_reachable_objects() for the index forgets to do so. Teach add_index_objects_to_pending() helper to also add objects referred to by the resolve-undo extension. Also make matching changes to "fsck", which has code that is fairly similar to the reachability stuff, but have parallel implementations for all these stuff, which may (or may not) someday want to be unified. Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-09 16:45:07 -07:00
Ævar Arnfjörð Bjarmason	2b7098936c	run-command API users: use strvec_pushl(), not argv construction Change a pattern of hardcoding an "argv" array size, populating it and assigning to the "argv" member of "struct child_process" to instead use "strvec_pushl()" to add data to the "args" member. This implements the same behavior as before in fewer lines of code, and moves us further towards being able to remove the "argv" member in a subsequent commit. Since we've entirely removed the "argv" variable(s) we can be sure that no potential logic errors of the type discussed in a preceding commit are being introduced here, i.e. ones where the local "argv" was being modified after the assignment to "struct child_process"'s "argv". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-25 22:15:07 -08:00
Junio C Hamano	2c0fa66bc8	Merge branch 'ab/fsck-unexpected-type' Regression fix. * ab/fsck-unexpected-type: object-file: free(*contents) only in read_loose_object() caller object-file: fix SEGV on free() regression in v2.34.0-rc2	2021-11-12 15:29:25 -08:00
Ævar Arnfjörð Bjarmason	16235e3b14	object-file: free(*contents) only in read_loose_object() caller In the preceding commit a free() of uninitialized memory regression in `96e41f58fe` (fsck: report invalid object type-path combinations, 2021-10-01) was fixed, but we'd still have an issue with leaking memory from fsck_loose(). Let's fix that issue too. That issue was introduced in my `31deb28f5e` (fsck: don't hard die on invalid object types, 2021-10-01). It can be reproduced under SANITIZE=leak with the test I added in `093fffdfbe` (fsck tests: add test for fsck-ing an unknown type, 2021-10-01): ./t1450-fsck.sh --run=84 -vixd In some sense it's not a problem, we lost the same amount of memory in terms of things malloc'd and not free'd. It just moved from the "still reachable" to "definitely lost" column in valgrind(1) nomenclature[1], since we'd have die()'d before. But now that we don't hard die() anymore in the library let's properly free() it. Doing so makes this code much easier to follow, since we'll now have one function owning the freeing of the "contents" variable, not two. For context on that memory management pattern the read_loose_object() function was added in `f6371f9210` (sha1_file: add read_loose_object() function, 2017-01-13) and subsequently used in `c68b489e56` (fsck: parse loose object paths directly, 2017-01-13). The pattern of it being the task of both sides to free() the memory has been there in this form since its inception. 1. https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-11 13:40:43 -08:00
Junio C Hamano	7afb458e91	Merge branch 'gc/use-repo-settings' It is wrong to read some settings directly from the config subsystem, as things like feature.experimental can affect their default values. * gc/use-repo-settings: gc: perform incremental repack when implictly enabled fsck: verify multi-pack-index when implictly enabled fsck: verify commit graph when implicitly enabled	2021-11-01 13:48:08 -07:00
Junio C Hamano	061a21d36d	Merge branch 'ab/fsck-unexpected-type' "git fsck" has been taught to report mismatch between expected and actual types of an object better. * ab/fsck-unexpected-type: fsck: report invalid object type-path combinations fsck: don't hard die on invalid object types object-file.c: stop dying in parse_loose_header() object-file.c: return ULHR_TOO_LONG on "header too long" object-file.c: use "enum" return type for unpack_loose_header() object-file.c: simplify unpack_loose_short_header() object-file.c: make parse_loose_header_extended() public object-file.c: return -1, not "status" from unpack_loose_header() object-file.c: don't set "typep" when returning non-zero cat-file tests: test for current --allow-unknown-type behavior cat-file tests: add corrupt loose object test cat-file tests: test for missing/bogus object with -t, -s and -p cat-file tests: move bogus_* variable declarations earlier fsck tests: test for garbage appended to a loose object fsck tests: test current hash/type mismatch behavior fsck tests: refactor one test to use a sub-repo fsck tests: add test for fsck-ing an unknown type	2021-10-25 16:06:56 -07:00
Glen Choo	dc5570872f	fsck: verify multi-pack-index when implictly enabled Like the previous commit, change fsck to check the "core_multi_pack_index" variable set in "repo-settings.c" instead of reading the "core.multiPackIndex" config variable. This fixes a bug where we wouldn't verify midx if the config key was missing. This bug was introduced in `18e449f86b` (midx: enable core.multiPackIndex by default, 2020-09-25) where core.multiPackIndex was turned on by default. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-15 14:30:08 -07:00
Glen Choo	f30e4d854b	fsck: verify commit graph when implicitly enabled Change fsck to check the "core_commit_graph" variable set in "repo-settings.c" instead of reading the "core.commitGraph" variable. This fixes a bug where we wouldn't verify the commit-graph if the config key was missing. This bug was introduced in `31b1de6a09` (commit-graph: turn on commit-graph by default, 2019-08-13), where core.commitGraph was turned on by default. Add tests to "t5318-commit-graph.sh" to verify that fsck checks the commit-graph as expected for the 3 values of core.commitGraph. Also, disable GIT_TEST_COMMIT_GRAPH in t/t0410-partial-clone.sh because some test cases use fsck in ways that assume that commit-graph checking is disabled. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Glen Choo <chooglen@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-15 14:30:07 -07:00
Ævar Arnfjörð Bjarmason	96e41f58fe	fsck: report invalid object type-path combinations Improve the error that's emitted in cases where we find a loose object we parse, but which isn't at the location we expect it to be. Before this change we'd prefix the error with a not-a-OID derived from the path at which the object was found, due to an emergent behavior in how we'd end up with an "OID" in these codepaths. Now we'll instead say what object we hashed, and what path it was found at. Before this patch series e.g.: $ git hash-object --stdin -w -t blob </dev/null `e69de29bb2` $ mv objects/e6/ objects/e7 Would emit ("[...]" used to abbreviate the OIDs): git fsck error: hash mismatch for ./objects/e7/9d[...] (expected e79d[...]) error: e79d[...]: object corrupt or missing: ./objects/e7/9d[...] Now we'll instead emit: error: e69d[...]: hash-path mismatch, found at: ./objects/e7/9d[...] Furthermore, we'll do the right thing when the object type and its location are bad. I.e. this case: $ git hash-object --stdin -w -t garbage --literally </dev/null 8315a83d2acc4c174aed59430f9a9c4ed926440f $ mv objects/83 objects/84 As noted in an earlier commits we'd simply die early in those cases, until preceding commits fixed the hard die on invalid object type: $ git fsck fatal: invalid object type Now we'll instead emit sensible error messages: $ git fsck error: 8315[...]: hash-path mismatch, found at: ./objects/84/15[...] error: 8315[...]: object is of unknown type 'garbage': ./objects/84/15[...] In both fsck.c and object-file.c we're using null_oid as a sentinel value for checking whether we got far enough to be certain that the issue was indeed this OID mismatch. We need to add the "object corrupt or missing" special-case to deal with cases where read_loose_object() will return an error before completing check_object_signature(), e.g. if we have an error in unpack_loose_rest() because we find garbage after the valid gzip content: $ git hash-object --stdin -w -t blob </dev/null `e69de29bb2` $ chmod 755 objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 $ echo garbage >>objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 $ git fsck error: garbage at end of loose object 'e69d[...]' error: unable to unpack contents of ./objects/e6/9d[...] error: e69d[...]: object corrupt or missing: ./objects/e6/9d[...] There is currently some weird messaging in the edge case when the two are combined, i.e. because we're not explicitly passing along an error state about this specific scenario from check_stream_oid() via read_loose_object() we'll end up printing the null OID if an object is of an unknown type and it can't be unpacked by zlib, e.g.: $ git hash-object --stdin -w -t garbage --literally </dev/null 8315a83d2acc4c174aed59430f9a9c4ed926440f $ chmod 755 objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f $ echo garbage >>objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f $ /usr/bin/git fsck fatal: invalid object type $ ~/g/git/git fsck error: garbage at end of loose object '8315a83d2acc4c174aed59430f9a9c4ed926440f' error: unable to unpack contents of ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f error: 8315a83d2acc4c174aed59430f9a9c4ed926440f: object corrupt or missing: ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f error: 0000000000000000000000000000000000000000: object is of unknown type 'garbage': ./objects/83/15a83d2acc4c174aed59430f9a9c4ed926440f [...] I think it's OK to leave that for future improvements, which would involve enum-ifying more error state as we've done with "enum unpack_loose_header_result" in preceding commits. In these increasingly more obscure cases the worst that can happen is that we'll get slightly nonsensical or inapplicable error messages. There's other such potential edge cases, all of which might produce some confusing messaging, but still be handled correctly as far as passing along errors goes. E.g. if check_object_signature() returns and oideq(real_oid, null_oid()) is true, which could happen if it returns -1 due to the read_istream() call having failed. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:01 -07:00
Ævar Arnfjörð Bjarmason	31deb28f5e	fsck: don't hard die on invalid object types Change the error fsck emits on invalid object types, such as: $ git hash-object --stdin -w -t garbage --literally </dev/null <OID> From the very ungraceful error of: $ git fsck fatal: invalid object type $ To: $ git fsck error: <OID>: object is of unknown type 'garbage': <OID_PATH> [ other fsck output ] We'll still exit with non-zero, but now we'll finish the rest of the traversal. The tests that's being added here asserts that we'll still complain about other fsck issues (e.g. an unrelated dangling blob). To do this we need to pass down the "OBJECT_INFO_ALLOW_UNKNOWN_TYPE" flag from read_loose_object() through to parse_loose_header(). Since the read_loose_object() function is only used in builtin/fsck.c we can simply change it to accept a "struct object_info" (which contains the OBJECT_INFO_ALLOW_UNKNOWN_TYPE in its flags). See `f6371f9210` (sha1_file: add read_loose_object() function, 2017-01-13) for the introduction of read_loose_object(). Since we'll need a "struct strbuf" to hold the "type_name" let's pass it to the for_each_loose_file_in_objdir() callback to avoid allocating a new one for each loose object in the iteration. It also makes the memory management simpler than sticking it in fsck_loose() itself, as we'll only need to strbuf_reset() it, with no need to do a strbuf_release() before each "return". Before this commit we'd never check the "type" if read_loose_object() failed, but now we do. We therefore need to initialize it to OBJ_NONE to be able to tell the difference between e.g. its unpack_loose_header() having failed, and us getting past that and into parse_loose_header(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-01 15:06:01 -07:00
Junio C Hamano	ed125c4f07	Merge branch 'ab/fsck-api-cleanup' Last minute compilation fix. * ab/fsck-api-cleanup: builtin/fsck.c: don't conflate "int" and "enum" in callback	2021-06-02 07:34:27 +09:00
Ævar Arnfjörð Bjarmason	28abf260a5	builtin/fsck.c: don't conflate "int" and "enum" in callback Fix a warning on AIX's xlc compiler that's been emitted since my `a1aad71601` (fsck.h: use "enum object_type" instead of "int", 2021-03-28): "builtin/fsck.c", line 805.32: 1506-068 (W) Operation between types "int()(struct object,enum object_type,void,struct fsck_options)" and "int()(struct object,int,void,struct fsck_options)" is not allowed. I.e. it complains about us assigning a function with a prototype "int" where we're expecting "enum object_type". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-06-02 05:59:15 +09:00
Junio C Hamano	8e97852919	Merge branch 'ds/sparse-index-protections' Builds on top of the sparse-index infrastructure to mark operations that are not ready to mark with the sparse index, causing them to fall back on fully-populated index that they always have worked with. * ds/sparse-index-protections: (47 commits) name-hash: use expand_to_path() sparse-index: expand_to_path() name-hash: don't add directories to name_hash revision: ensure full index resolve-undo: ensure full index read-cache: ensure full index pathspec: ensure full index merge-recursive: ensure full index entry: ensure full index dir: ensure full index update-index: ensure full index stash: ensure full index rm: ensure full index merge-index: ensure full index ls-files: ensure full index grep: ensure full index fsck: ensure full index difftool: ensure full index commit: ensure full index checkout: ensure full index ...	2021-04-30 13:50:26 +09:00
Derrick Stolee	2227ea175f	fsck: ensure full index When verifying all blobs reachable from the index, ensure that a sparse index has been expanded to a full one to avoid missing some blobs. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-14 13:47:11 -07:00
Jeff King	45a187cc34	lookup_unknown_object(): take a repository argument All of the other lookup_foo() functions take a repository argument, but lookup_unknown_object() was never converted, and it uses the_repository internally. Let's fix that. We could leave a wrapper that uses the_repository, but there aren't that many calls, so we'll just convert them all. I looked briefly at each site to see if we had a repository struct (besides the_repository) we could pass, but none of them do (so this conversion to pass the_repository is a pure noop in each case, though it does take us one step closer to eventually getting rid of the_repository). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-04-13 13:18:46 -07:00
Ævar Arnfjörð Bjarmason	394d5d31b0	fsck.c: pass along the fsck_msg_id in the fsck_error callback Change the fsck_error callback to also pass along the fsck_msg_id. Before this change the only way to get the message id was to parse it back out of the "message". Let's pass it down explicitly for the benefit of callers that might want to use it, as discussed in [1]. Passing the msg_type is now redundant, as you can always get it back from the msg_id, but I'm not changing that convention. It's really common to need the msg_type, and the report() function itself (which calls "fsck_error") needs to call fsck_msg_type() to discover it. Let's not needlessly re-do that work in the user callback. 1. https://lore.kernel.org/git/87blcja2ha.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
Ævar Arnfjörð Bjarmason	1b32b59f9b	fsck.h: move FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} into an enum Move the FSCK_{FATAL,INFO,ERROR,WARN,IGNORE} defines into a new fsck_msg_type enum. These defines were originally introduced in: - `ba002f3b28` (builtin-fsck: move common object checking code to fsck.c, 2008-02-25) - `f50c440730` (fsck: disallow demoting grave fsck errors to warnings, 2015-06-22) - `efaba7cc77` (fsck: optionally ignore specific fsck issues completely, 2015-06-22) - `f27d05b170` (fsck: allow upgrading fsck warnings to errors, 2015-06-22) The reason these were defined in two different places is because we use FSCK_{IGNORE,INFO,FATAL} only in fsck.c, but FSCK_{ERROR,WARN} are used by external callbacks. Untangling that would take some more work, since we expose the new "enum fsck_msg_type" to both. Similar to "enum object_type" it's not worth structuring the API in such a way that only those who need FSCK_{ERROR,WARN} pass around a different type. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
Ævar Arnfjörð Bjarmason	a1aad71601	fsck.h: use "enum object_type" instead of "int" Change the fsck_walk_func to use an "enum object_type" instead of an "int" type. The types are compatible, and ever since this was added in `355885d531` (add generic, type aware object chain walker, 2008-02-25) we've used entries from object_type (OBJ_BLOB etc.). So this doesn't really change anything as far as the generated code is concerned, it just gives the compiler more information and makes this easier to read. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-28 19:03:10 -07:00
Ævar Arnfjörð Bjarmason	fb79f5bff7	fsck.c: refactor and rename common config callback Refactor code I recently changed in `1f3299fda9` (fsck: make fsck_config() re-usable, 2021-01-05) so that I could use fsck's config callback in mktag in `1f3299fda9` (fsck: make fsck_config() re-usable, 2021-01-05). I don't know what I was thinking in structuring the code this way, but it clearly makes no sense to have an fsck_config_internal() at all just so it can get a fsck_options when git_config() already supports passing along some void* data. Let's just make use of that instead, which gets us rid of the two wrapper functions, and brings fsck's common config callback in line with other such reusable config callbacks. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-03-17 14:02:43 -07:00
Ævar Arnfjörð Bjarmason	1f3299fda9	fsck: make fsck_config() re-usable Move the fsck_config() function from builtin/fsck.c to fsck.[ch]. This allows for re-using it in other tools that expose fsck logic and want to support its configuration variables. A logical continuation of this change would be to use a common function for all of {fetch,receive}.fsck.* and fsck.. See `5d477a334a` (fsck (receive-pack): allow demoting errors to warnings, 2015-06-22) and my own `1362df0d41` (fetch: implement fetch.fsck., 2018-07-27) for the relevant code. However, those routines want to not parse the fsck.skipList into OIDs, but rather pass them along with the --strict option to another process. It would be possible to refactor that whole thing so we support e.g. a "fetch." prefix, then just keep track of the skiplist as a filename instead of parsing it, and learn to spew that all out from our internal structures into something we can append to the --strict option. But instead I'm planning to re-use this in "mktag", which'll just re-use these "fsck.*" variables as-is. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-05 14:58:29 -08:00
Jonathan Tan	9eb86f41de	fsck: do not lazy fetch known non-promisor object There is a call to has_object_file(), which lazily fetches missing objects in a partial clone, when the object is known to not be a promisor object. Change that call to has_object(), which does not do any lazy fetching. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-06 13:01:03 -07:00
Junio C Hamano	645f63111b	Merge branch 'es/get-worktrees-unsort' API cleanup for get_worktrees() * es/get-worktrees-unsort: worktree: drop get_worktrees() unused 'flags' argument worktree: drop get_worktrees() special-purpose sorting option	2020-07-06 22:09:15 -07:00
Eric Sunshine	03f2465bb1	worktree: drop get_worktrees() unused 'flags' argument get_worktrees() accepts a 'flags' argument, however, there are no existing flags (the lone flag GWT_SORT_LINKED was recently retired) and no behavior which can be tweaked. Therefore, drop the 'flags' argument. Signed-off-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-22 10:31:15 -07:00
Abhishek Kumar	6da43d937c	object: drop parsed_object_pool->commit_count `14ba97f8` (alloc: allow arbitrary repositories for alloc functions, 2018-05-15) introduced parsed_object_pool->commit_count to keep count of commits per repository and was used to assign commit->index. However, commit-slab code requires commit->index values to be unique and a global count would be correct, rather than a per-repo count. Let's introduce a static counter variable, `parsed_commits_count` to keep track of parsed commits so far. As commit_count has no use anymore, let's also drop it from the struct. Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-06-17 14:37:14 -07:00
Junio C Hamano	d55a4ae71d	Merge branch 'ds/multi-pack-verify' Fix for a copy-and-paste error introduced during 2.20 era. * ds/multi-pack-verify: fsck: use ERROR_MULTI_PACK_INDEX	2020-05-24 19:39:39 -07:00
Derrick Stolee	e68a5272b1	fsck: use ERROR_MULTI_PACK_INDEX The multi-pack-index was added to the data verified by git-fsck in ea5ae6c3 "fsck: verify multi-pack-index". This implementation was based on the implementation for verifying the commit-graph, and a copy-paste error kept the ERROR_COMMIT_GRAPH flag as the bit set when an error appears in the multi-pack-index. Add a new flag, ERROR_MULTI_PACK_INDEX, and use that instead. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-19 16:13:22 -07:00
Jeff King	5afc4b1dc6	fsck: only provide oid/type in fsck_error callback None of the callbacks actually care about having a "struct object"; they're happy with just the oid and type information. So let's give ourselves more flexibility to avoid having a "struct object" by just passing the broken-down fields. Note that the callback already takes a "type" field for the fsck message type. We'll rename that to "msg_type" (and use "object_type" for the object type) to make the distinction explicit. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:18 +09:00
Jeff King	82ef89b318	fsck: don't require object structs for display functions Our printable_type() and describe_object() functions take whole object structs, but they really only care about the oid and type. Let's take those individually in order to give our callers more flexibility. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:18 +09:00
Jeff King	733902905d	fsck: use oids rather than objects for object_name API We don't actually care about having object structs; we only need to look up decorations by oid. Let's accept this more limited form, which will give our callers more flexibility. Note that the decoration API we rely on uses object structs itself (even though it only looks at their oids). We can solve this by switching to a kh_oid_map (we could also use the hashmap oidmap, but it's more awkward for the simple case of just storing a void pointer). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:18 +09:00
Jeff King	a59cfb3230	fsck: unify object-name code Commit `90cf590f53` (fsck: optionally show more helpful info for broken links, 2016-07-17) added a system for decorating objects with names. The code is split across builtin/fsck.c (which gives the initial names) and fsck.c (which adds to the names as it traverses the object graph). This leads to some duplication, where both sites have near-identical describe_object() functions (the difference being that the one in builtin/fsck.c uses a circular array of buffers to allow multiple calls in a single printf). Let's provide a unified object_name API for fsck. That lets us drop the duplication, as well as making the interface boundaries more clear (which will let us refactor the implementation more in a future patch). We'll leave describe_object() in builtin/fsck.c as a thin wrapper around the new API, as it relies on a static global to make its many callers a bit shorter. We'll also convert the bare add_decoration() calls in builtin/fsck.c to put_object_name(). This fixes two minor bugs: 1. We leak many small strings. add_decoration() has a last-one-wins approach: it updates the decoration to the new string and returns the old one. But we ignore the return value, leaking the old string. This is quite common to trigger, since we look at reflogs: the tip of any ref will be described both by looking at the actual ref, as well as the latest reflog entry. So we'd always end up leaking one of those strings. 2. The last-one-wins approach gives us lousy names. For instance, we first look at all of the refs, and then all of the reflogs. So rather than seeing "refs/heads/master", we're likely to overwrite it with "HEAD@{12345678}". We're generally better off using the first name we find. And indeed, the test in t1450 expects this ugly HEAD@{} name. After this patch, we've switched to using fsck_put_object_name()'s first-one-wins semantics, and we output the more human-friendly "refs/tags/julius" (and the test is updated accordingly). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-28 14:05:17 +09:00
Jeff King	d0229abd93	object: convert lookup_object() to use object_id There are no callers left of lookup_object() that aren't just passing us the "hash" member of a "struct object_id". Let's take the whole struct, which gets us closer to removing all raw sha1 variables. It also matches the existing conversions of lookup_blob(), etc. The conversions of callers were done by hand, but they're all mechanical one-liners. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-20 10:18:09 -07:00

1 2 3 4 5

213 Commits