mirror/git - git - git.dotya.ml

mirror/ git

mirror of https://github.com/git/git.git synced 2024-06-18 21:19:13 +02:00

Author	SHA1	Message	Date
Han Xin	97a9db6ffb	object-file.c: refactor write_loose_object() to several steps When writing a large blob using "write_loose_object()", we have to pass a buffer with the whole content of the blob, and this behavior will consume lots of memory and may cause OOM. We will introduce a stream version function ("stream_loose_object()") in later commit to resolve this issue. Before introducing that streaming function, do some refactoring on "write_loose_object()" to reuse code for both versions. Rewrite "write_loose_object()" as follows: 1. Figure out a path for the (temp) object file. This step is only used in "write_loose_object()". 2. Move common steps for starting to write loose objects into a new function "start_loose_object_common()". 3. Compress data. 4. Move common steps for ending zlib stream into a new function "end_loose_object_common()". 5. Close fd and finalize the object file. Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Helped-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Han Xin <chiyutianyi@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-06-13 10:22:35 -07:00
Junio C Hamano	a50036da1a	Merge branch 'tb/cruft-packs' A mechanism to pack unreachable objects into a "cruft pack", instead of ejecting them into loose form to be reclaimed later, has been introduced. * tb/cruft-packs: sha1-file.c: don't freshen cruft packs builtin/gc.c: conditionally avoid pruning objects via loose builtin/repack.c: add cruft packs to MIDX during geometric repack builtin/repack.c: use named flags for existing_packs builtin/repack.c: allow configuring cruft pack generation builtin/repack.c: support generating a cruft pack builtin/pack-objects.c: --cruft with expiration reachable: report precise timestamps from objects in cruft packs reachable: add options to add_unseen_recent_objects_to_traversal builtin/pack-objects.c: --cruft without expiration builtin/pack-objects.c: return from create_object_entry() t/helper: add 'pack-mtimes' test-tool pack-mtimes: support writing pack .mtimes files chunk-format.h: extract oid_version() pack-write: pass 'struct packing_data' to 'stage_tmp_packfiles' pack-mtimes: support reading .mtimes files Documentation/technical: add cruft-packs.txt	2022-06-03 14:30:37 -07:00
Junio C Hamano	d8c8dccbaa	Merge branch 'ds/object-file-unpack-loose-header-fix' Coding style fix. * ds/object-file-unpack-loose-header-fix: object-file: convert 'switch' back to 'if'	2022-06-03 14:30:35 -07:00
Junio C Hamano	83937e9592	Merge branch 'ns/batch-fsync' Introduce a filesystem-dependent mechanism to optimize the way the bits for many loose object files are ensured to hit the disk platter. * ns/batch-fsync: core.fsyncmethod: performance tests for batch mode t/perf: add iteration setup mechanism to perf-lib core.fsyncmethod: tests for batch mode test-lib-functions: add parsing helpers for ls-files and ls-tree core.fsync: use batch mode and sync loose objects by default on Windows unpack-objects: use the bulk-checkin infrastructure update-index: use the bulk-checkin infrastructure builtin/add: add ODB transaction around add_files_to_cache cache-tree: use ODB transaction around writing a tree core.fsyncmethod: batched disk flushes for loose-objects bulk-checkin: rebrand plug/unplug APIs as 'odb transactions' bulk-checkin: rename 'state' variable and separate 'plugged' boolean	2022-06-03 14:30:34 -07:00
Taylor Blau	a613164257	sha1-file.c: don't freshen cruft packs We don't bother to freshen objects stored in a cruft pack individually by updating the `.mtimes` file. This is because we can't portably `mmap` and write into the middle of a file (i.e., to update the mtime of just one object). Instead, we would have to rewrite the entire `.mtimes` file which may incur some wasted effort especially if there a lot of cruft objects and they are freshened infrequently. Instead, force the freshening code to avoid an optimizing write by writing out the object loose and letting it pick up a current mtime. This works because we prefer the mtime of the loose copy of an object when both a loose and packed one exist (whether or not the packed copy comes from a cruft pack or not). This could certainly do with a test and/or be included earlier in this series/PR, but I want to wait until after I have a chance to clean up the overly-repetitive nature of the cruft pack tests in general. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Taylor Blau	b757353676	builtin/pack-objects.c: --cruft without expiration Teach `pack-objects` how to generate a cruft pack when no objects are dropped (i.e., `--cruft-expiration=never`). Later patches will teach `pack-objects` how to generate a cruft pack that prunes objects. When generating a cruft pack which does not prune objects, we want to collect all unreachable objects into a single pack (noting and updating their mtimes as we accumulate them). Ordinary use will pass the result of a `git repack -A` as a kept pack, so when this patch says "kept pack", readers should think "reachable objects". Generating a non-expiring cruft packs works as follows: - Callers provide a list of every pack they know about, and indicate which packs are about to be removed. - All packs which are going to be removed (we'll call these the redundant ones) are marked as kept in-core. Any packs the caller did not mention (but are known to the `pack-objects` process) are also marked as kept in-core. Packs not mentioned by the caller are assumed to be unknown to them, i.e., they entered the repository after the caller decided which packs should be kept and which should be discarded. Since we do not want to include objects in these "unknown" packs (because we don't know which of their objects are or aren't reachable), these are also marked as kept in-core. - Then, we enumerate all objects in the repository, and add them to our packing list if they do not appear in an in-core kept pack. This results in a new cruft pack which contains all known objects that aren't included in the kept packs. When the kept pack is the result of `git repack -A`, the resulting pack contains all unreachable objects. Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-26 15:48:26 -07:00
Junio C Hamano	1b8138fb08	Merge branch 'ab/valgrind-fixes' A bit of test framework fixes with a few fixes to issues found by valgrind. * ab/valgrind-fixes: commit-graph.c: don't assume that stat() succeeds object-file: fix a unpack_loose_header() regression in `3b6a8db3b0` log test: skip a failing mkstemp() test under valgrind tests: using custom GIT_EXEC_PATH breaks --valgrind tests	2022-05-23 14:39:54 -07:00
Junio C Hamano	538dc459a0	Merge branch 'ep/maint-equals-null-cocci' Introduce and apply coccinelle rule to discourage an explicit comparison between a pointer and NULL, and applies the clean-up to the maintenance track. * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-20 15:26:59 -07:00
Derrick Stolee	8a50571a0e	object-file: convert 'switch' back to 'if' This switch statement was recently added to make it clear that unpack_loose_header() returns an enum value, not an int. This adds complications for future developers if that enum gains new values, since that developer would need to add a case statement to this switch for little real value. Instead, we can revert back to an 'if' statement, but make the enum explicit by using "!= ULHR_OK" instead of assuming it has the numerical value zero. Co-authored-by: Junio C Hamano <gitster@pobox.com> Signed-off-by: Derrick Stolee <derrickstolee@github.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-16 17:28:02 -07:00
Ævar Arnfjörð Bjarmason	4627c67fa6	object-file: fix a unpack_loose_header() regression in `3b6a8db3b0` Fix a regression in my `3b6a8db3b0` (object-file.c: use "enum" return type for unpack_loose_header(), 2021-10-01) revealed both by running the test suite with --valgrind, and with the amended "git fsck" test. In practice this regression in v2.34.0 caused us to claim that we couldn't parse the header, as opposed to not being able to unpack it. Before the change in the C code the test_cmp added here would emit: -error: unable to unpack header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 +error: unable to parse header of ./objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 I.e. we'd proceed to call parse_loose_header() on the uninitialized "hdr" value, and it would have been very unlikely for that uninitialized memory to be a valid git object. The other callers of unpack_loose_header() were already checking the enum values exhaustively. See `3b6a8db3b0` and `5848fb11ac` (object-file.c: return ULHR_TOO_LONG on "header too long", 2021-10-01). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-12 15:42:26 -07:00
Junio C Hamano	2b0a58d164	Merge branch 'ep/maint-equals-null-cocci' for maint-2.35 * ep/maint-equals-null-cocci: tree-wide: apply equals-null.cocci contrib/coccinnelle: add equals-null.cocci	2022-05-02 10:06:04 -07:00
Junio C Hamano	afe8a9070b	tree-wide: apply equals-null.cocci Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-05-02 09:50:37 -07:00
Neeraj Singh	c0f4752ed2	core.fsyncmethod: batched disk flushes for loose-objects When adding many objects to a repo with `core.fsync=loose-object`, the cost of fsync'ing each object file can become prohibitive. One major source of the cost of fsync is the implied flush of the hardware writeback cache within the disk drive. This commit introduces a new `core.fsyncMethod=batch` option that batches up hardware flushes. It hooks into the bulk-checkin odb-transaction functionality, takes advantage of tmp-objdir, and uses the writeout-only support code. When the new mode is enabled, we do the following for each new object: 1a. Create the object in a tmp-objdir. 2a. Issue a pagecache writeback request and wait for it to complete. At the end of the entire transaction when unplugging bulk checkin: 1b. Issue an fsync against a dummy file to flush the log and hardware writeback cache, which should by now have seen the tmp-objdir writes. 2b. Rename all of the tmp-objdir files to their final names. 3b. When updating the index and/or refs, we assume that Git will issue another fsync internal to that operation. This is not the default today, but the user now has the option of syncing the index and there is a separate patch series to implement syncing of refs. On a filesystem with a singular journal that is updated during name operations (e.g. create, link, rename, etc), such as NTFS, HFS+, or XFS we would expect the fsync to trigger a journal writeout so that this sequence is enough to ensure that the user's data is durable by the time the git command returns. This sequence also ensures that no object files appear in the main object store unless they are fsync-durable. Batch mode is only enabled if core.fsync includes loose-objects. If the legacy core.fsyncObjectFiles setting is enabled, but core.fsync does not include loose-objects, we will use file-by-file fsyncing. In step (1a) of the sequence, the tmp-objdir is created lazily to avoid work if no loose objects are ever added to the ODB. We use a tmp-objdir to maintain the invariant that no loose-objects are visible in the main ODB unless they are properly fsync-durable. This is important since future ODB operations that try to create an object with specific contents will silently drop the new data if an object with the target hash exists without checking that the loose-object contents match the hash. Only a full git-fsck would restore the ODB to a functional state where dataloss doesn't occur. In step (1b) of the sequence, we issue a fsync against a dummy file created specifically for the purpose. This method has a little higher cost than using one of the input object files, but makes adding new callers of this mechanism easier, since we don't need to figure out which object file is "last" or risk sharing violations by caching the fd of the last object file. _Performance numbers_: Linux - Hyper-V VM running Kernel 5.11 (Ubuntu 20.04) on a fast SSD. Mac - macOS 11.5.1 running on a Mac mini on a 1TB Apple SSD. Windows - Same host as Linux, a preview version of Windows 11. Adding 500 files to the repo with 'git add' Times reported in seconds. object file syncing \| Linux \| Mac \| Windows --------------------\|-------\|-------\|-------- disabled \| 0.06 \| 0.35 \| 0.61 fsync \| 1.88 \| 11.18 \| 2.47 batch \| 0.15 \| 0.41 \| 1.53 Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-04-06 13:13:01 -07:00
Neeraj Singh	c4e707f858	object-file: pass filename to fsync_or_die If we die while trying to fsync a loose object file, pass the actual filename we're trying to sync. This is likely to be more helpful for a user trying to diagnose the cause of the failure than the former 'loose object file' string. It also sidesteps any concerns about translating the die message differently for loose objects versus something else that has a real path. Reported-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-30 14:46:47 -07:00
Junio C Hamano	3d8046a820	Merge branch 'ab/refs-various-fixes' Code clean-up. * ab/refs-various-fixes: refs debug: add a wrapper for "read_symbolic_ref" packed-backend: remove stub BUG(...) functions misc *.c: use designated initializers for struct assignments refs: use designated initializers for "struct ref_iterator_vtable" refs: use designated initializers for "struct ref_storage_be"	2022-03-29 12:22:02 -07:00
Junio C Hamano	eb804cd405	Merge branch 'ns/core-fsyncmethod' Replace core.fsyncObjectFiles with two new configuration variables, core.fsync and core.fsyncMethod. * ns/core-fsyncmethod: core.fsync: documentation and user-friendly aggregate options core.fsync: new option to harden the index core.fsync: add configuration parsing core.fsync: introduce granular fsync control infrastructure core.fsyncmethod: add writeout-only mode wrapper: make inclusion of Windows csprng header tightly scoped	2022-03-25 16:38:24 -07:00
Ævar Arnfjörð Bjarmason	501036492b	misc *.c: use designated initializers for struct assignments Change a few miscellaneous non-designated initializer assignments to use designated initializers. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-17 10:36:42 -07:00
Junio C Hamano	430883a70c	Merge branch 'ab/object-file-api-updates' Object-file API shuffling. * ab/object-file-api-updates: object-file API: pass an enum to read_object_with_reference() object-file.c: add a literal version of write_object_file_prepare() object-file API: have hash_object_file() take "enum object_type" object API: rename hash_object_file_literally() to write_() object-file API: split up and simplify check_object_signature() object API users + docs: check <0, not !0 with check_object_signature() object API docs: move check_object_signature() docs to cache.h object API: correct "buf" v.s. "map" mismatch in .c and *.h object-file API: have write_object_file() take "enum object_type" object-file API: add a format_object_header() function object-file API: return "void", not "int" from hash_object_file() object-file.c: split up declaration of unrelated variables	2022-03-16 17:53:08 -07:00
Neeraj Singh	020406eaa5	core.fsync: introduce granular fsync control infrastructure This commit introduces the infrastructure for the core.fsync configuration knob. The repository components we want to sync are identified by flags so that we can turn on or off syncing for specific components. If core.fsyncObjectFiles is set and the core.fsync configuration also includes FSYNC_COMPONENT_LOOSE_OBJECT, we will fsync any loose objects. This picks the strictest data integrity behavior if core.fsync and core.fsyncObjectFiles are set to conflicting values. This change introduces the currently unused fsync_component helper, which will be used by a later patch that adds fsyncing to the refs backend. Actual configuration and documentation of the fsync components list are in other patches in the series to separate review of the underlying mechanism from the policy of how it's configured. Helped-by: Patrick Steinhardt <ps@pks.im> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-03-10 15:10:22 -08:00
Ævar Arnfjörð Bjarmason	6aea6baeb3	object-file API: pass an enum to read_object_with_reference() Change the read_object_with_reference() function to take an "enum object_type". It was not prepared to handle an arbitrary "const char *type", as it was itself calling type_from_string(). Let's change the only caller that passes in user data to use type_from_string(), and convert the rest to use e.g. "OBJ_TREE" instead of "tree_type". The "cat-file" caller is not on the codepath that handles"--allow-unknown", so the type_from_string() there is safe. Its use of type_from_string() doesn't functionally differ from that of the pre-image. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason	2bbb28a3ee	object-file.c: add a literal version of write_object_file_prepare() Split off a *_literally() variant of the write_object_file_prepare() function. To do this create a new "hash_object_body()" static helper. We now defer the type_name() call until the very last moment in format_object_header() for those callers that aren't "hash-object --literally". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason	44439c1c58	object-file API: have hash_object_file() take "enum object_type" Change the hash_object_file() function to take an "enum object_type". Since a preceding commit all of its callers are passing either "{commit,tree,blob,tag}_type", or the result of a call to type_name(), the parse_object() caller that would pass NULL is now using stream_object_signature(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason	0ff7b4f976	object API: rename hash_object_file_literally() to write_() Before `0c3db67cc8` (hash-object --literally: fix buffer overrun with extra-long object type, 2015-05-04) the hash-object code being changed here called write_sha1_file() to both hash and write a loose object. Before that we'd use hash_sha1_file() to if "-w" wasn't provided, and otherwise call write_sha1_file(). Now we'll always call the same function for both writing. Let's rename it from hash__literally() to write__literally(). Even though the write_() might not actually write if HASH_WRITE_OBJECT isn't in "flags", having it be more similar to write_object_file_flags() than hash_object_file(), but carrying a name that would suggest that it's a variant of the latter is confusing. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:32 -08:00
Ævar Arnfjörð Bjarmason	0f156dbb04	object-file API: split up and simplify check_object_signature() Split up the check_object_signature() function into that non-streaming version (it accepts an already filled "buf"), and a new stream_object_signature() which will retrieve the object from storage, and hash it on-the-fly. All of the callers of check_object_signature() were effectively calling two different functions, if we go by cyclomatic complexity. I.e. they'd either take the early "if (map)" branch and return early, or not. This has been the case since the "if (map)" condition was added in `090ea12671` (parse_object: avoid putting whole blob in core, 2012-03-07). We can then further simplify the resulting check_object_signature() function since only one caller wanted to pass a non-NULL "buf" and a non-NULL "real_oidp". That "read_loose_object()" codepath used by "git fsck" can instead use hash_object_file() followed by oideq(). Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	ee213de22d	object API users + docs: check <0, not !0 with check_object_signature() Change those users of the object API that misused check_object_signature() by assuming it returned any non-zero when the OID didn't match the expected value to check <0 instead. In practice all of this code worked before, but it wasn't consistent with rest of the users of the API. Let's also clarify what the <0 return value means in API docs. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	cdcaaec9a6	object API docs: move check_object_signature() docs to cache.h Move the API documentation for check_object_signature() to cache.h, where its prototype is declared. This is in preparation for adding a companion function. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	73182b2d84	object API: correct "buf" v.s. "map" mismatch in .c and .h Change the name of the second argument to check_object_signature() to be "buf" in object-file.c, making it consistent with the prototype in cache.h This fixes an inconsistency that's been with us since `2ade934026` (Add "check_sha1_signature()" helper function, 2005-04-08), and makes a subsequent commit's diff smaller, as we'll move these API docs to cache.h. While we're at it fix a small grammar error in the documentation, dropping an "an" before "in-core object-data". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	c80d226a04	object-file API: have write_object_file() take "enum object_type" Change the write_object_file() function to take an "enum object_type" instead of a "const char type". Its callers either passed {commit,tree,blob,tag}_type and can pass the corresponding OBJ_ type instead, or were hardcoding strings like "blob". This avoids the back & forth fragility where the callers of write_object_file() would have the enum type, and convert it themselves via type_name(). We do have to now do that conversion ourselves before calling write_object_file_prepare(), but those codepaths will be similarly adjusted in subsequent commits. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	b04cdea46c	object-file API: add a format_object_header() function Add a convenience function to wrap the xsnprintf() command that generates loose object headers. This code was copy/pasted in various parts of the codebase, let's define it in one place and re-use it from there. All except one caller of it had a valid "enum object_type" for us, it's only write_object_file_prepare() which might need to deal with "git hash-object --literally" and a potential garbage type. Let's have the primary API use an "enum object_type", and define a _literally() function that can take an arbitrary "const char " for the type. See [1] for the discussion that prompted this patch, i.e. new code in object-file.c that wanted to copy/paste the xsnprintf() invocation. In the case of fast-import.c the callers unfortunately need to cast back & forth between "unsigned char " and "char ", since format_object_header() ad encode_in_pack_object_header() take different signedness. 1. https://lore.kernel.org/git/211213.86bl1l9bfz.gmgdl@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Jiang Xin <zhiyou.jx@alibaba-inc.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	63e05f9056	object-file API: return "void", not "int" from hash_object_file() The hash_object_file() function added in `abdc3fc842` (Add hash_sha1_file(), 2006-10-14) did not have a meaningful return value, and it never has. One was seemingly added to avoid adding braces to the "ret = " assignments being modified here. Let's instead assign "0" to the "ret" variables at the beginning of the relevant functions, and have them return "void". Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	bbea0ddeb9	object-file.c: split up declaration of unrelated variables Split up the declaration of the "ret" and "re_allocated" variables. It's not our usual style to group variable declarations simply because they share a type, we'd only prefer to do so when the two are closely related (e.g. "int i, j"). This change makes a subsequent and meaningful change's diff smaller. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-25 17:16:31 -08:00
Ævar Arnfjörð Bjarmason	0cb9872eab	object-file: use designated initializers for "struct git_hash_algo" As with the preceding commit, change another file-level struct assignment to use designated initializers. Retain the ".name = NULL" etc. in the case of the first element of "unknown hash algorithm", to make it explicit that we're intentionally not setting those, it's not just that we forgot. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2022-02-24 15:59:14 -08:00
Junio C Hamano	0dc90d954d	Merge branch 'ns/tmp-objdir' New interface into the tmp-objdir API to help in-core use of the quarantine feature. * ns/tmp-objdir: tmp-objdir: disable ref updates when replacing the primary odb tmp-objdir: new API for creating temporary writable databases	2022-01-03 16:24:15 -08:00
Junio C Hamano	832ec72c3e	Merge branch 'ab/run-command' API clean-up. * ab/run-command: run-command API: remove "env" member, always use "env_array" difftool: use "env_array" to simplify memory management run-command API: remove "argv" member, always use "args" run-command API users: use strvec_push(), not argv construction run-command API users: use strvec_pushl(), not argv construction run-command tests: use strvec_pushv(), not argv assignment run-command API users: use strvec_pushv(), not argv assignment upload-archive: use regular "struct child_process" pattern worktree: stop being overly intimate with run_command() internals	2021-12-15 09:39:47 -08:00
Junio C Hamano	a4bbd13be3	Merge branch 'hn/reftable' The "reftable" backend for the refs API, without integrating into the refs subsystem, has been added. * hn/reftable: Add "test-tool dump-reftable" command. reftable: add dump utility reftable: implement stack, a mutable database of reftable files. reftable: implement refname validation reftable: add merged table view reftable: add a heap-based priority queue for reftable records reftable: reftable file level tests reftable: read reftable files reftable: generic interface to tables reftable: write reftable files reftable: a generic binary tree implementation reftable: reading/writing blocks Provide zlib's uncompress2 from compat/zlib-compat.c reftable: (de)serialization for the polymorphic record type. reftable: add blocksource, an abstraction for random access reads reftable: utility functions reftable: add error related functionality reftable: add LICENSE hash.h: provide constants for the hash IDs	2021-12-15 09:39:45 -08:00
Junio C Hamano	cb136bd852	Merge branch 'po/size-t-for-vs' On platforms where ulong is shorter than size_t, code paths that shifted 1 or 1U to the left lacked the necessary cast to size_t, which have been corrected. * po/size-t-for-vs: object-file.c: LLP64 compatibility, upcast unity for left shift diffcore-delta.c: LLP64 compatibility, upcast unity for left shift repack.c: LLP64 compatibility, upcast unity for left shift	2021-12-10 14:35:10 -08:00
Neeraj Singh	ecd81dfc79	tmp-objdir: disable ref updates when replacing the primary odb When creating a subprocess with a temporary ODB, we set the GIT_QUARANTINE_ENVIRONMENT env var to tell child Git processes not to update refs, since the tmp-objdir may go away. Introduce a similar mechanism for in-process temporary ODBs when we call tmp_objdir_replace_primary_odb. Now both mechanisms set the disable_ref_updates flag on the odb, which is queried by the ref_transaction_prepare function. Peff's test case [1] was invoking ref updates via the cachetextconv setting. That particular code silently does nothing when a ref update is forbidden. See the call to notes_cache_put in fill_textconv where errors are ignored. [1] https://lore.kernel.org/git/YVOn3hDsb5pnxR53@coredump.intra.peff.net/ Reported-by: Jeff King <peff@peff.net> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-08 14:06:46 -08:00
Neeraj Singh	b3cecf49ea	tmp-objdir: new API for creating temporary writable databases The tmp_objdir API provides the ability to create temporary object directories, but was designed with the goal of having subprocesses access these object stores, followed by the main process migrating objects from it to the main object store or just deleting it. The subprocesses would view it as their primary datastore and write to it. Here we add the tmp_objdir_replace_primary_odb function that replaces the current process's writable "main" object directory with the specified one. The previous main object directory is restored in either tmp_objdir_migrate or tmp_objdir_destroy. For the --remerge-diff usecase, add a new `will_destroy` flag in `struct object_database` to mark ephemeral object databases that do not require fsync durability. Add 'git prune' support for removing temporary object databases, and make sure that they have a name starting with tmp_ and containing an operation-specific name. Based-on-patch-by: Elijah Newren <newren@gmail.com> Signed-off-by: Neeraj Singh <neerajsi@microsoft.com> Reviewed-by: Elijah Newren <newren@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-08 14:06:36 -08:00
Philip Oakley	26de1fc0c9	object-file.c: LLP64 compatibility, upcast unity for left shift Visual Studio reports C4334 "was 64-bit shift intended" warning because of size miss-match. Promote unity to the matching type to fit with the assignment. Signed-off-by: Philip Oakley <philipoakley@iee.email> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-12-01 14:48:10 -08:00
Junio C Hamano	f9ba6acaa9	Merge branch 'mc/clean-smudge-with-llp64' The clean/smudge conversion code path has been prepared to better work on platforms where ulong is narrower than size_t. * mc/clean-smudge-with-llp64: clean/smudge: allow clean filters to process extremely large files odb: guard against data loss checking out a huge file git-compat-util: introduce more size_t helpers odb: teach read_blob_entry to use size_t t1051: introduce a smudge filter test for extremely large files test-lib: add prerequisite for 64-bit platforms test-tool genzeros: generate large amounts of data more efficiently test-genzeros: allow more than 2G zeros in Windows	2021-11-29 15:41:51 -08:00
Ævar Arnfjörð Bjarmason	c7c4bdeccf	run-command API: remove "env" member, always use "env_array" Remove the "env" member from "struct child_process" in favor of always using the "env_array". As with the preceding removal of "argv" in favor of "args" this gets rid of current and future oddities around memory management at the API boundary (see the amended API docs). For some of the conversions we can replace patterns like: child.env = env->v; With: strvec_pushv(&child.env_array, env->v); But for others we need to guard the strvec_pushv() with a NULL check, since we're not passing in the "v" member of a "struct strvec", e.g. in the case of tmp_objdir_env()'s return value. Ideally we'd rename the "env_array" member to simply "env" as a follow-up, since it and "args" are now inconsistent in not having an "_array" suffix, and seemingly without any good reason, unless we look at the history of how they came to be. But as we've currently got 122 in-tree hits for a "git grep env_array" let's leave that for now (and possibly forever). Doing that rename would be too disruptive. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-25 22:15:08 -08:00
Junio C Hamano	2c0fa66bc8	Merge branch 'ab/fsck-unexpected-type' Regression fix. * ab/fsck-unexpected-type: object-file: free(*contents) only in read_loose_object() caller object-file: fix SEGV on free() regression in v2.34.0-rc2	2021-11-12 15:29:25 -08:00
Ævar Arnfjörð Bjarmason	16235e3b14	object-file: free(*contents) only in read_loose_object() caller In the preceding commit a free() of uninitialized memory regression in `96e41f58fe` (fsck: report invalid object type-path combinations, 2021-10-01) was fixed, but we'd still have an issue with leaking memory from fsck_loose(). Let's fix that issue too. That issue was introduced in my `31deb28f5e` (fsck: don't hard die on invalid object types, 2021-10-01). It can be reproduced under SANITIZE=leak with the test I added in `093fffdfbe` (fsck tests: add test for fsck-ing an unknown type, 2021-10-01): ./t1450-fsck.sh --run=84 -vixd In some sense it's not a problem, we lost the same amount of memory in terms of things malloc'd and not free'd. It just moved from the "still reachable" to "definitely lost" column in valgrind(1) nomenclature[1], since we'd have die()'d before. But now that we don't hard die() anymore in the library let's properly free() it. Doing so makes this code much easier to follow, since we'll now have one function owning the freeing of the "contents" variable, not two. For context on that memory management pattern the read_loose_object() function was added in `f6371f9210` (sha1_file: add read_loose_object() function, 2017-01-13) and subsequently used in `c68b489e56` (fsck: parse loose object paths directly, 2017-01-13). The pattern of it being the task of both sides to free() the memory has been there in this form since its inception. 1. https://valgrind.org/docs/manual/mc-manual.html#mc-manual.leaks Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-11 13:40:43 -08:00
Ævar Arnfjörð Bjarmason	168a937bbc	object-file: fix SEGV on free() regression in v2.34.0-rc2 Fix a regression introduced in my `96e41f58fe` (fsck: report invalid object type-path combinations, 2021-10-01). When fsck-ing blobs larger than core.bigFileThreshold, we'd free() a pointer to uninitialized memory. This issue would have been caught by SANITIZE=address, but since it involves core.bigFileThreshold, none of the existing tests in our test suite covered it. Running them with the "big_file_threshold" in "environment.c" changed to say "6" would have shown this failure, but let's add a dedicated test for this scenario based on Han Xin's report[1]. The bug was introduced between v9 and v10[2] of the fsck series merged in `061a21d36d` (Merge branch 'ab/fsck-unexpected-type', 2021-10-25). 1. https://lore.kernel.org/git/20211111030302.75694-1-hanxin.hx@alibaba-inc.com/ 2. https://lore.kernel.org/git/cover-v10-00.17-00000000000-20211001T091051Z-avarab@gmail.com/ Reported-by: Han Xin <chiyutianyi@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-11 10:41:54 -08:00
Matt Cooper	d6a09e795d	odb: guard against data loss checking out a huge file This introduces an additional guard for platforms where `unsigned long` and `size_t` are not of the same size. If the size of an object in the database would overflow `unsigned long`, instead we now exit with an error. A complete fix will have to update _many_ other functions throughout the codebase to use `size_t` instead of `unsigned long`. It will have to be implemented at some stage. This commit puts in a stop-gap for the time being. Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Matt Cooper <vtbassmatt@gmail.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-11-03 11:22:27 -07:00
Junio C Hamano	2c428e4205	Merge branch 'ab/fix-commit-error-message-upon-unwritable-object-store' "git commit" gave duplicated error message when the object store was unwritable, which has been corrected. * ab/fix-commit-error-message-upon-unwritable-object-store: commit: fix duplication regression in permission error output unwritable tests: assert exact error output	2021-10-25 16:06:57 -07:00
Junio C Hamano	162a13b855	Merge branch 'jt/no-abuse-alternate-odb-for-submodules' Follow through the work to use the repo interface to access submodule objects in-process, instead of abusing the alternate object database interface. * jt/no-abuse-alternate-odb-for-submodules: submodule: trace adding submodule ODB as alternate submodule: pass repo to check_has_commit() object-file: only register submodule ODB if needed merge-{ort,recursive}: remove add_submodule_odb() refs: peeling non-the_repository iterators is BUG refs: teach arbitrary repo support to iterators refs: plumb repo into ref stores	2021-10-25 16:06:56 -07:00
Junio C Hamano	061a21d36d	Merge branch 'ab/fsck-unexpected-type' "git fsck" has been taught to report mismatch between expected and actual types of an object better. * ab/fsck-unexpected-type: fsck: report invalid object type-path combinations fsck: don't hard die on invalid object types object-file.c: stop dying in parse_loose_header() object-file.c: return ULHR_TOO_LONG on "header too long" object-file.c: use "enum" return type for unpack_loose_header() object-file.c: simplify unpack_loose_short_header() object-file.c: make parse_loose_header_extended() public object-file.c: return -1, not "status" from unpack_loose_header() object-file.c: don't set "typep" when returning non-zero cat-file tests: test for current --allow-unknown-type behavior cat-file tests: add corrupt loose object test cat-file tests: test for missing/bogus object with -t, -s and -p cat-file tests: move bogus_* variable declarations earlier fsck tests: test for garbage appended to a loose object fsck tests: test current hash/type mismatch behavior fsck tests: refactor one test to use a sub-repo fsck tests: add test for fsck-ing an unknown type	2021-10-25 16:06:56 -07:00
Ævar Arnfjörð Bjarmason	4ef91a2d79	commit: fix duplication regression in permission error output Fix a regression in the error output emitted when .git/objects can't be written to. Before `9c4d6c0297` (cache-tree: Write updated cache-tree after commit, 2014-07-13) we'd emit only one "insufficient permission" error, now we'll do so again. The cause is rather straightforward, we've got WRITE_TREE_SILENT for the use-case of wanting to prepare an index silently, quieting any permission etc. error output. Then when we attempt to update to that (possibly broken) index we'll run into the same errors again. But with `9c4d6c0297` the gap between the cache-tree API and the object store wasn't closed in terms of asking write_object_file() to be silent. I.e. post-9c4d6c0297b the first call is to prepare_index(), and after that we'll call prepare_to_commit(). We only want verbose error output from the latter. So let's add and use that facility with a corresponding HASH_SILENT flag, its only user is cache-tree.c's update_one(), which will set it if its "WRITE_TREE_SILENT" flag is set. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-12 11:16:59 -07:00
Jonathan Tan	eef71904ff	object-file: only register submodule ODB if needed In `a35e03dee0` ("submodule: lazily add submodule ODBs as alternates", 2021-09-08), Git was taught to add all known submodule ODBs as alternates when attempting to read an object that doesn't exist, as a fallback for when a submodule object is read as if it were in the_repository. However, this behavior wasn't restricted to happen only when reading from the_repository. Fix this. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-10-08 15:06:06 -07:00

1 2

87 Commits