mirror/git - git - git.dotya.ml

mirror/ git

mirror of https://github.com/git/git.git synced 2024-05-11 06:36:10 +02:00

Author	SHA1	Message	Date
Martin Ågren	bc62692757	hash-lookup: rename from sha1-lookup Change all remnants of "sha1" in hash-lookup.c and .h and rename them to reflect that we're not just able to handle SHA-1 these days. Signed-off-by: Martin Ågren <martin.agren@gmail.com> Reviewed-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2021-01-04 13:01:55 -08:00
Junio C Hamano	6bac6a1ef9	Merge branch 'tb/idx-midx-race-fix' Processes that access packdata while the .idx file gets removed (e.g. while repacking) did not fail or fall back gracefully as they could. * tb/idx-midx-race-fix: midx.c: protect against disappearing packs packfile.c: protect against disappearing indexes	2020-12-08 15:11:18 -08:00
Taylor Blau	c8a45eb66e	packfile.c: protect against disappearing indexes In `17c35c8969` (packfile: skip loading index if in multi-pack-index, 2018-07-12) we stopped loading the .idx file for packs that are contained within a multi-pack index. This saves us the effort of loading an .idx and doing some lightweight validity checks by way of 'packfile.c:load_idx()', but introduces a race between processes that need to load the index (e.g., to generate a reverse index) and processes that can delete the index. For example, running the following in your shell: $ git init repo && cd repo $ git commit --allow-empty -m 'base' $ git repack -ad && git multi-pack-index write followed by: $ rm -f .git/objects/pack/pack-*.idx $ git rev-parse HEAD \| git cat-file --batch-check='%(objectsize:disk)' will result in a segfault prior to this patch. What's happening here is that we notice that the pack is in the multi-pack index, and so don't check that it still has a .idx. When we then try and load that index to generate a reverse index, we don't have it, so the call to 'find_pack_revindex()' in 'packfile.c:packed_object_info()' returns NULL, and then dereferencing it causes a segfault. Of course, we don't ever expect someone to remove the index file by hand, or to be in a state where we never wrote it to begin with (yet find that pack in the multi-pack-index). But, this can happen in a timing race with 'git repack -ad', which removes all existing packs after writing a new pack containing all of their objects. Avoid this by reverting the hunk of `17c35c8969` which stops loading the index when the pack is contained in a MIDX. This makes the latter half of `17c35c8969` useless, since we'll always have a non-NULL 'p->index_data', in which case that if statement isn't guarding anything. These two together effectively revert `17c35c8969`, and avoid the race explained above. Co-authored-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-25 13:15:49 -08:00
Jeff King	81c4c5cf2e	packfile: detect overflow in .idx file size checks In load_idx(), we check that the .idx file is sized appropriately for the number of objects it claims to have. We recently fixed the case where the number of objects caused our expected size to overflow a 32-bit unsigned int, and we switched to size_t. On a 64-bit system, this is fine; our size_t covers any expected size. On a 32-bit system, though, it won't. The file may claim to have 2^31 objects, which will overflow even a size_t. This doesn't hurt us at all for a well-formed idx file. A 32-bit system would already have failed to mmap such a file, since it would be too big. But an .idx file which _claims_ to have 2^31 objects but is actually much smaller would fool our check. This is a broken file, and for the most part we don't care that much what happens. But: - it's a little friendlier to notice up front "woah, this file is broken" than it is to get nonsense results - later access of the data assumes that the loading function sanity-checked that we have at least enough bytes for the regular object-id table. A malformed .idx file could lead to an out-of-bounds read. So let's use our overflow-checking functions to make sure that we're not fooled by a malformed file. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-16 13:41:35 -08:00
Jeff King	a9bc372ef8	use size_t to store pack .idx byte offsets We sometimes store the offset into a pack .idx file as an "unsigned long", but the mmap'd size of a pack .idx file can exceed 4GB. This is sufficient on LP64 systems like Linux, but will be too small on LLP64 systems like Windows, where "unsigned long" is still only 32 bits. Let's use size_t, which is a better type for an offset into a memory buffer. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-16 13:41:35 -08:00
Jeff King	f86f769550	compute pack .idx byte offsets using size_t A pack and its matching .idx file are limited to 2^32 objects, because the pack format contains a 32-bit field to store the number of objects. Hence we use uint32_t in the code. But the byte count of even a .idx file can be much larger than that, because it stores at least a hash and an offset for each object. So using SHA-1, a v2 .idx file will cross the 4GB boundary at 153,391,650 objects. This confuses load_idx(), which computes the minimum size like this: unsigned long min_size = 8 + 4256 + nr(hashsz + 4 + 4) + hashsz + hashsz; Even though min_size will be big enough on most 64-bit platforms, the actual arithmetic is done as a uint32_t, resulting in a truncation. We actually exceed that min_size, but then we do: unsigned long max_size = min_size; if (nr) max_size += (nr - 1)8; to account for the variable-sized table. That computation doesn't overflow quite so low, but with the truncation for min_size, we end up with a max_size that is much smaller than our actual size. So we complain that the idx is invalid, and can't find any of its objects. We can fix this case by casting "nr" to a size_t, which will do the multiplication in 64-bits (assuming you're on a 64-bit platform; this will never work on a 32-bit system since we couldn't map the whole .idx anyway). Likewise, we don't have to worry about further additions, because adding a smaller number to a size_t will convert the other side to a size_t. A few notes: - obviously we could just declare "nr" as a size_t in the first place (and likewise, packed_git.num_objects). But it's conceptually a uint32_t because of the on-disk format, and we correctly treat it that way in other contexts that don't need to compute byte offsets (e.g., iterating over the set of objects should and generally does use a uint32_t). Switching to size_t would make all of those other cases look wrong. - it could be argued that the proper type is off_t to represent the file offset. But in practice the .idx file must fit within memory, because we mmap the whole thing. And the rest of the code (including the idx_size variable we're comparing against) uses size_t. - we'll add the same cast to the max_size arithmetic line. Even though we're adding to a larger type, which will convert our result, the multiplication is still done as a 32-bit value and can itself overflow. I didn't check this with my test case, since it would need an even larger pack (~530M objects), but looking at compiler output shows that it works this way. The standard should agree, but I couldn't find anything explicit in 6.3.1.8 ("usual arithmetic conversions"). The case in load_idx() was the most immediate one that I was able to trigger. After fixing it, looking up actual objects (including the very last one in sha1 order) works in a test repo with 153,725,110 objects. That's because bsearch_hash() works with uint32_t entry indices, and the actual byte access: int cmp = hashcmp(table + mi stride, sha1); is done with "stride" as a size_t, causing the uint32_t "mi" to be promoted to a size_t. This is the way most code will access the index data. However, I audited all of the other byte-wise accesses of packed_git.index_data, and many of the others are suspect (they are similar to the max_size one, where we are adding to a properly sized offset or directly to a pointer, but the multiplication in the sub-expression can overflow). I didn't trigger any of these in practice, but I believe they're potential problems, and certainly adding in the cast is not going to hurt anything here. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-11-16 13:41:35 -08:00
Junio C Hamano	26b42b4dd8	Merge branch 'mt/delta-base-cache-races' A race that leads to an access to a free'd data was corrected in the codepath that reads pack files. * mt/delta-base-cache-races: packfile: fix memory leak in add_delta_base_cache() packfile: fix race condition on unpack_entry()	2020-10-04 12:49:15 -07:00
Matheus Tavares	bda959c476	packfile: fix memory leak in add_delta_base_cache() When add_delta_base_cache() is called with a base that is already in the cache, no operation is performed. But the check is done after allocating space for a new entry, so we end up leaking memory on the early return. In addition, the caller never free()'s the base as it expects the function to take ownership of it. But the base is not released when we skip insertion, so it also gets leaked. To fix these problems, move the allocation of a new entry further down in add_delta_base_cache(), and free() the base on early return. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-28 17:41:53 -07:00
Matheus Tavares	74b052f8c2	packfile: fix race condition on unpack_entry() The third phase of unpack_entry() performs the following sequence in a loop, until all the deltas enumerated in phase one are applied and the entry is fully reconstructed: 1. Add the current base entry to the delta base cache 2. Unpack the next delta 3. Patch the unpacked delta on top of the base When the optional object reading lock is enabled, the above steps will be performed while holding the lock. However, step 2. momentarily releases it so that inflation can be performed in parallel for increased performance. Because the `base` buffer inserted in the cache at 1. is not duplicated, another thread can potentially free() it while the lock is released at 2. (e.g. when there is no space left in the cache to insert another entry). In this case, the later attempt to dereference `base` at 3. will cause a segmentation fault. This problem was observed during a multithreaded git-grep execution on a repository with large objects. To fix the race condition (and later segmentation fault), let's reorder the aforementioned steps so that `base` is only added to the cache at the end. This will prevent the buffer from being released by another thread while it is still in use. An alternative solution which would not require the reordering would be to duplicate `base` before inserting it in the cache. However, as Phil Hord mentioned, memcpy()'ing large bases can negatively affect performance: in his experiments, this alternative approach slowed git-grep down by 10% to 20%. Reported-by: Phil Hord <phil.hord@gmail.com> Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-28 17:41:52 -07:00
Junio C Hamano	221b755f3a	Merge branch 'jk/dont-count-existing-objects-twice' There is a logic to estimate how many objects are in the repository, which is mean to run once per process invocation, but it ran every time the estimated value was requested. * jk/dont-count-existing-objects-twice: packfile: actually set approximate_object_count_valid	2020-09-22 12:36:32 -07:00
Jeff King	67bb65de5d	packfile: actually set approximate_object_count_valid The approximate_object_count() function tries to compute the count only once per process. But ever since it was introduced in `8e3f52d778` (find_unique_abbrev: move logic out of get_short_sha1(), 2016-10-03), we failed to actually set the "valid" flag, meaning we'd compute it fresh on every call. This turns out not to be _too_ bad, because we're only iterating through the packed_git list, and not making any system calls. But since it may get called for every abbreviated hash we output, even this can add up if you have many packs. Here are before-and-after timings for a new perf test which just asks rev-list to abbreviate each commit hash (the test repo is linux.git, with commit-graphs): Test origin HEAD ---------------------------------------------------------------------------- 5303.3: rev-list (1) 28.91(28.46+0.44) 29.03(28.65+0.38) +0.4% 5303.4: abbrev-commit (1) 1.18(1.06+0.11) 1.17(1.02+0.14) -0.8% 5303.7: rev-list (50) 28.95(28.56+0.38) 29.50(29.17+0.32) +1.9% 5303.8: abbrev-commit (50) 3.67(3.56+0.10) 3.57(3.42+0.15) -2.7% 5303.11: rev-list (1000) 30.34(29.89+0.43) 30.82(30.35+0.46) +1.6% 5303.12: abbrev-commit (1000) 86.82(86.52+0.29) 77.82(77.59+0.22) -10.4% 5303.15: load 10,000 packs 0.08(0.02+0.05) 0.08(0.02+0.06) +0.0% It doesn't help at all when we have 1 pack (5303.4), but we get a 10% speedup when there are 1000 packs (5303.12). That's a modest speedup for a case that's already slow and we'd hope to avoid in general (note how slow it is even after, because we have to look in each of those packs for abbreviations). But it's a one-line change that clearly matches the original intent, so it seems worth doing. The included perf test may also be useful for keeping an eye on any regressions in the overall abbreviation code. Reported-by: Rasmus Villemoes <rv@rasmusvillemoes.dk> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-09-17 11:36:14 -07:00
Taylor Blau	59552fb3e2	midx: traverse the local MIDX first When a repository has an alternate object directory configured, callers can traverse through each alternate's MIDX by walking the '->next' pointer. But, when 'prepare_multi_pack_index_one()' loads multiple MIDXs, it places the new ones at the front of this pointer chain, not at the end. This can be confusing for callers such as 'git repack -ad', causing test failures like in t7700.6 with 'GIT_TEST_MULTI_PACK_INDEX=1'. The occurs when dropping a pack known to the local MIDX with alternates configured that have their own MIDX. Since the alternate's MIDX is returned via 'get_multi_pack_index()', 'midx_contains_pack()' returns true (which is correct, since it traverses through the '->next' pointer to find the MIDX in the chain that does contain the requested object). But, we call 'clear_midx_file()' on 'the_repository', which drops the MIDX at the path of the first MIDX in the chain, which (in the case of t7700.6 is the one in the alternate). This patch addresses that by: - placing the local MIDX first in the chain when calling 'prepare_multi_pack_index_one()', and - introducing a new 'get_local_multi_pack_index()', which explicitly returns the repository-local MIDX, if any. Don't impose an additional order on the MIDX's '->next' pointer beyond that the first item in the chain must be local if one exists so that we avoid a quadratic insertion. Likewise, use 'get_local_multi_pack_index()' in 'remove_redundant_pack()' to fix the formerly broken t7700.6 when run with 'GIT_TEST_MULTI_PACK_INDEX=1'. Finally, note that the MIDX ordering invariant is only preserved by the insertion order in 'prepare_packed_git()', which traverses through the ODB's '->next' pointer, meaning we visit the local object store first. This fragility makes this an undesirable long-term solution if more callers are added, but it is acceptable for now since this is the only caller. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-08-28 14:07:09 -07:00
brian m. carlson	629dffc461	packfile: compute and use the index CRC offset Both v2 pack index files and the v3 format specified as part of the NewHash work have similar data starting at the CRC table. Much of the existing code wants to read either this table or the offset entries following it, and in doing so computes the offset each time. In order to share as much code between v2 and v3, compute the offset of the CRC table and store it when the pack is opened. Use this value to compute offsets to not only the CRC table, but to the offset entries beyond it. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-05-27 10:07:07 -07:00
Jeff King	2fecc48cad	packfile: drop nth_packed_object_sha1() Once upon a time, nth_packed_object_sha1() was the primary way to get the oid of a packfile's index position. But these days we have the more type-safe nth_packed_object_id() wrapper, and all callers have been converted. Let's drop the "sha1" version (turning the safer wrapper into a single function) so that nobody is tempted to introduce new callers. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 12:55:53 -08:00
Jeff King	6ac9760a30	packed_object_info(): use object_id internally for delta base The previous commit changed the public interface of packed_object_info() to return a struct object_id rather than a bare hash. That enables us to convert our internal helper, as well. We can use nth_packed_object_id() directly for OFS_DELTA, but we'll still have to use oidread() to pull the hash for a REF_DELTA out of the packfile. There should be no additional cost, since we're copying directly into the object_id the caller provided us (just as we did before; it's just happening now via nth_packed_object_id()). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 12:55:53 -08:00
Jeff King	b99b6bcc57	packed_object_info(): use object_id for returning delta base If a caller sets the object_info.delta_base_sha1 to a non-NULL pointer, we'll write the oid of the object's delta base to it. But we can increase our type safety by switching this to a real object_id struct. All of our callers are just pointing into the hash member of an object_id anyway, so there's no inconvenience. Note that we do still keep it as a pointer-to-struct, because the NULL sentinel value tells us whether the caller is even interested in the information. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 12:55:53 -08:00
Jeff King	0763671b8e	nth_packed_object_oid(): use customary integer return Our nth_packed_object_sha1() function returns NULL for error. So when we wrapped it with nth_packed_object_oid(), we kept the same semantics. But it's a bit funny, because the caller actually passes in an out parameter, and the pointer we return is just that same struct they passed to us (or NULL). It's not too terrible, but it does make the interface a little non-idiomatic. Let's switch to our usual "0 for success, negative for error" return value. Most callers either don't check it, or are trivially converted. The one that requires the biggest change is actually improved, as we can ditch an extra aliased pointer variable. Since we are changing the interface in a subtle way that the compiler wouldn't catch, let's also change the name to catch any topics in flight. We can drop the 'o' and make it nth_packed_object_id(). That's slightly shorter, but also less redundant since the 'o' stands for "object" already. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-02-24 12:55:42 -08:00
Junio C Hamano	56ceb64eb0	Merge branch 'mt/threaded-grep-in-object-store' Traditionally, we avoided threaded grep while searching in objects (as opposed to files in the working tree) as accesses to the object layer is not thread-safe. This limitation is getting lifted. * mt/threaded-grep-in-object-store: grep: use no. of cores as the default no. of threads grep: move driver pre-load out of critical section grep: re-enable threads in non-worktree case grep: protect packed_git [re-]initialization grep: allow submodule functions to run in parallel submodule-config: add skip_if_read option to repo_read_gitmodules() grep: replace grep_read_mutex by internal obj read lock object-store: allow threaded access to object reading replace-object: make replace operations thread-safe grep: fix racy calls in grep_objects() grep: fix race conditions at grep_submodule() grep: fix race conditions on userdiff calls	2020-02-14 12:54:20 -08:00
Junio C Hamano	a14aebeac3	Merge branch 'jk/packfile-reuse-cleanup' The way "git pack-objects" reuses objects stored in existing pack to generate its result has been improved. * jk/packfile-reuse-cleanup: pack-bitmap: don't rely on bitmap_git->reuse_objects pack-objects: add checks for duplicate objects pack-objects: improve partial packfile reuse builtin/pack-objects: introduce obj_is_packed() pack-objects: introduce pack.allowPackReuse csum-file: introduce hashfile_total() pack-bitmap: simplify bitmap_has_oid_in_uninteresting() pack-bitmap: uninteresting oid can be outside bitmapped packfile pack-bitmap: introduce bitmap_walk_contains() ewah/bitmap: introduce bitmap_word_alloc() packfile: expose get_delta_base() builtin/pack-objects: report reused packfile objects	2020-02-14 12:54:19 -08:00
Matheus Tavares	6c307626f1	grep: protect packed_git [re-]initialization Some fields in struct raw_object_store are lazy initialized by the thread-unsafe packfile.c:prepare_packed_git(). Although this function is present in the call stack of git-grep threads, all paths to it are currently protected by obj_read_lock() (and the main thread usually indirectly calls it before firing the worker threads, anyway). However, it's possible that future modifications add new unprotected paths to it, introducing a race condition. Because errors derived from it wouldn't happen often, it could be hard to detect. So to prevent future headaches, let's force eager initialization of packed_git when setting git-grep up. There'll be a small overhead in the cases where we didn't really need to prepare packed_git during execution but this shouldn't be very noticeable. Also, packed_git may be re-initialized by packfile.c:reprepare_packed_git(). Again, all paths to it in git-grep are already protected by obj_read_lock() but it may suffer from the same problem in the future. So let's also internally protect it with obj_read_lock() (which is a recursive mutex). Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Matheus Tavares	31877c9aec	object-store: allow threaded access to object reading Allow object reading to be performed by multiple threads protecting it with an internal lock, the obj_read_mutex. The lock usage can be toggled with enable_obj_read_lock() and disable_obj_read_lock(). Currently, the functions which can be safely called in parallel are: read_object_file_extended(), repo_read_object_file(), read_object_file(), read_object_with_reference(), read_object(), oid_object_info() and oid_object_info_extended(). It's also possible to use obj_read_lock() and obj_read_unlock() to protect other sections that cannot execute in parallel with object reading. Probably there are many spots in the functions listed above that could be executed unlocked (and thus, in parallel). But, for now, we are most interested in allowing parallel access to zlib inflation. This is one of the sections where object reading spends most of the time in (e.g. up to one-third of git-grep's execution time in the chromium repo corresponds to inflation) and it's already thread-safe. So, to take advantage of that, the obj_read_mutex is released when calling git_inflate() and re-acquired right after, for every calling spot in oid_object_info_extended()'s call chain. We may refine this lock to also exploit other possible parallel spots in the future, but for now, threaded zlib inflation should already give great speedups for threaded object reading callers. Note that add_delta_base_cache() was also modified to skip adding already present entries to the cache. This wasn't possible before, but it would be now, with the parallel inflation. Take for example the following situation, where two threads - A and B - are executing the code at unpack_entry(): 1. Thread A is performing the decompression of a base O (which is not yet in the cache) at PHASE II. Thread B is simultaneously trying to unpack O, but just starting at PHASE I. 2. Since O is not yet in the cache, B will go to PHASE II to also perform the decompression. 3. When they finish decompressing, one of them will get the object reading mutex and go to PHASE III while the other waits for the mutex. Let’s say A got the mutex first. 4. Thread A will add O to the cache, go throughout the rest of PHASE III and return. 5. Thread B gets the mutex, also add O to the cache (if the check wasn't there) and returns. Finally, it is also important to highlight that the object reading lock can only ensure thread-safety in the mentioned functions thanks to two complementary mechanisms: the use of 'struct raw_object_store's replace_mutex, which guards sections in the object reading machinery that would otherwise be thread-unsafe; and the 'struct pack_window's inuse_cnt, which protects window reading operations (such as the one performed during the inflation of a packed object), allowing them to execute without the acquisition of the obj_read_mutex. Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2020-01-17 13:52:14 -08:00
Junio C Hamano	556f0258df	Merge branch 'ew/packfile-syscall-optim' Code cleanup. * ew/packfile-syscall-optim: packfile: replace lseek+read with pread packfile: remove redundant fcntl F_GETFD/F_SETFD	2020-01-06 14:17:50 -08:00
Eric Wong	4e61b2214d	packfile: replace lseek+read with pread We already have pread emulation for portability, so there's there's no reason to make two syscalls where one suffices. Furthermore, readers of the packfile will be using mmap (or pread to emulate mmap), anyways, so the file description offset does not matter in this case. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-26 11:57:46 -08:00
Eric Wong	280738c36e	packfile: remove redundant fcntl F_GETFD/F_SETFD git_open sets close-on-exec since `cd66ada065` ("sha1_file: open window into packfiles with O_CLOEXEC"). There's no reason to keep using fcntl to set the close-on-exec flag, anymore. Signed-off-by: Eric Wong <e@80x24.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-26 10:09:35 -08:00
Colin Stolley	ec48540fe8	packfile.c: speed up loading lots of packfiles When loading packfiles on start-up, we traverse the internal packfile list once per file to avoid reloading packfiles that have already been loaded. This check runs in quadratic time, so for poorly maintained repos with a large number of packfiles, it can be pretty slow. Add a hashmap containing the packfile names as we load them so that the average runtime cost of checking for already-loaded packs becomes constant. Add a perf test to p5303 to show speed-up. The existing p5303 test runtimes are dominated by other factors and do not show an appreciable speed-up. The new test in p5303 clearly exposes a speed-up in bad cases. In this test we create 10,000 packfiles and measure the start-up time of git rev-parse, which does little else besides load in the packs. Here are the numbers for the new p5303 test: Test HEAD^ HEAD --------------------------------------------------------------------- 5303.12: load 10,000 packs 1.03(0.92+0.10) 0.12(0.02+0.09) -88.3% Signed-off-by: Colin Stolley <cstolley@runbox.com> Helped-by: Jeff King <peff@peff.net> [jc: squashed the change to call hashmap in install_packed_git() by peff] Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-12-03 07:59:45 -08:00
Junio C Hamano	5efabc7ed9	Merge branch 'ew/hashmap' Code clean-up of the hashmap API, both users and implementation. * ew/hashmap: hashmap_entry: remove first member requirement from docs hashmap: remove type arg from hashmap_{get,put,remove}_entry OFFSETOF_VAR macro to simplify hashmap iterators hashmap: introduce hashmap_free_entries hashmap: hashmap_{put,remove} return hashmap_entry * hashmap: use _entry APIs for iteration hashmap_cmp_fn takes hashmap_entry params hashmap_get{,_from_hash} return "struct hashmap_entry " hashmap: use _entry APIs to wrap container_of hashmap_get_next returns "struct hashmap_entry " introduce container_of macro hashmap_put takes "struct hashmap_entry " hashmap_remove takes "const struct hashmap_entry " hashmap_get takes "const struct hashmap_entry " hashmap_add takes "struct hashmap_entry " hashmap_get_next takes "const struct hashmap_entry " hashmap_entry_init takes "struct hashmap_entry " packfile: use hashmap_entry in delta_base_cache_entry coccicheck: detect hashmap_entry.hash assignment diff: use hashmap_entry_init on moved_entry.ent	2019-10-15 13:48:02 +09:00
Junio C Hamano	a4c5d9f66e	Merge branch 'rs/dedup-includes' Code cleanup. * rs/dedup-includes: treewide: remove duplicate #include directives	2019-10-11 14:24:48 +09:00
Junio C Hamano	676278f8ea	Merge branch 'bc/object-id-part17' Preparation for SHA-256 upgrade continues. * bc/object-id-part17: (26 commits) midx: switch to using the_hash_algo builtin/show-index: replace sha1_to_hex rerere: replace sha1_to_hex builtin/receive-pack: replace sha1_to_hex builtin/index-pack: replace sha1_to_hex packfile: replace sha1_to_hex wt-status: convert struct wt_status to object_id cache: remove null_sha1 builtin/worktree: switch null_sha1 to null_oid builtin/repack: write object IDs of the proper length pack-write: use hash_to_hex when writing checksums sequencer: convert to use the_hash_algo bisect: switch to using the_hash_algo sha1-lookup: switch hard-coded constants to the_hash_algo config: use the_hash_algo in abbrev comparison combine-diff: replace GIT_SHA1_HEXSZ with the_hash_algo bundle: switch to use the_hash_algo connected: switch GIT_SHA1_HEXSZ to the_hash_algo show-index: switch hard-coded constants to the_hash_algo blame: remove needless comparison with GIT_SHA1_HEXSZ ...	2019-10-11 14:24:46 +09:00
Eric Wong	939af16eac	hashmap_cmp_fn takes hashmap_entry params Another step in eliminating the requirement of hashmap_entry being the first member of a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	f23a465132	hashmap_get{,_from_hash} return "struct hashmap_entry *" Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash or container_of as appropriate. This is another step towards eliminating the requirement of hashmap_entry being the first field in a struct. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:11 +09:00
Eric Wong	28ee794128	hashmap_remove takes "const struct hashmap_entry " This is less error-prone than "const void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	b94e5c1df6	hashmap_add takes "struct hashmap_entry " This is less error-prone than "void " as the compiler now detects invalid types being passed. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:10 +09:00
Eric Wong	d22245a2e3	hashmap_entry_init takes "struct hashmap_entry " C compilers do type checking to make life easier for us. So rely on that and update all hashmap_entry_init callers to take "struct hashmap_entry " to avoid future bugs while improving safety and readability. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:09 +09:00
Eric Wong	d0a48a0a1d	packfile: use hashmap_entry in delta_base_cache_entry This hashmap_entry_init function is intended to take a hashmap_entry struct pointer, not a hashmap struct pointer. This was not noticed because hashmap_entry_init takes a "void " arg instead of "struct hashmap_entry ", and the hashmap struct is larger and can be cast into a hashmap_entry struct without data corruption. This has the beneficial side effect of reducing the size of a delta_base_cache_entry from 104 bytes to 72 bytes on 64-bit systems. Signed-off-by: Eric Wong <e@80x24.org> Reviewed-by: Derrick Stolee <stolee@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-07 10:20:09 +09:00
René Scharfe	2fe44394c8	treewide: remove duplicate #include directives Found with "git grep '^#include ' '*.c' \| sort \| uniq -d". Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-10-04 08:16:00 +09:00
Junio C Hamano	cf861cd7a0	Merge branch 'rs/get-tagged-oid' Code cleanup. * rs/get-tagged-oid: use get_tagged_oid() tag: factor out get_tagged_oid()	2019-09-30 13:19:29 +09:00
Junio C Hamano	b9ac6c59b8	Merge branch 'cc/multi-promisor' Teach the lazy clone machinery that there can be more than one promisor remote and consult them in order when downloading missing objects on demand. * cc/multi-promisor: Move core_partial_clone_filter_default to promisor-remote.c Move repository_format_partial_clone to promisor-remote.c Remove fetch-object.{c,h} in favor of promisor-remote.{c,h} remote: add promisor and partial clone config to the doc partial-clone: add multiple remotes in the doc t0410: test fetching from many promisor remotes builtin/fetch: remove unique promisor remote limitation promisor-remote: parse remote.*.partialclonefilter Use promisor_remote_get_direct() and has_promisor_remote() promisor-remote: use repository_format_partial_clone promisor-remote: add promisor_remote_reinit() promisor-remote: implement promisor_remote_get_direct() Add initial support for many promisor remotes fetch-object: make functions return an error code t0410: remove pipes after git commands	2019-09-18 11:50:09 -07:00
Jeff King	56d9cbe68b	packfile: expose get_delta_base() In a following commit get_delta_base() will be used outside packfile.c, so let's make it non static and declare it in packfile.h. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-13 14:40:33 -07:00
René Scharfe	c77722b3ea	use get_tagged_oid() Avoid derefencing ->tagged without checking for NULL by using the convenience wrapper for getting the ID of the tagged object. It die()s when encountering a broken tag instead of segfaulting. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-09-05 14:11:34 -07:00
brian m. carlson	3a4d7aa5ae	packfile: replace sha1_to_hex Replace a use of sha1_to_hex with hash_to_hex so that this code works with a hash algorithm other than SHA-1. Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-19 15:04:59 -07:00
Jeff King	9827d4c185	packfile: drop release_pack_memory() Long ago, in `97bfeb34df` (Release pack windows before reporting out of memory., 2006-12-24), we taught xmalloc() and friends to try unmapping pack windows when malloc() failed. It's unlikely that his helps a lot in practice, and it has some downsides. First, the downsides: 1. It makes xmalloc() not thread-safe. We've worked around this in pack-objects.c, which installs its own locking version of the try_to_free_routine(). But other threaded code doesn't. 2. It makes the system as a whole harder to reason about. Functions which allocate heap memory under the hood may have farther-reaching effects than expected. That might be worth the tradeoff if there's a benefit. But in practice, it seems unlikely. We're generally dealing with mmap'd files, so the OS is going to do a much better job at responding to memory pressure by dropping individual pages (the exception is systems with NO_MMAP, but even there the OS can probably respond just as well with swapping). So the only thing we're really freeing is address space. On 64-bit systems, we have plenty of that to go around. On 32-bit systems, it could possibly help. But around the same time we made two other changes: `77ccc5bbd1` (Introduce new config option for mmap limit., 2006-12-23) and `60bb8b1453` (Fully activate the sliding window pack access., 2006-12-23). Together that means that a 32-bit system should have no more than 256MB total of packed-git mmaps at one time, split between a few 32MB windows. It's unlikely we have any address space problems since then, but we don't have any data since the features were all added at the same time. Likewise, xmmap() will try to free memory. At first glance, it seems like we'd need this (when we try to mmap a new window, we might need to close an old one to save address space on a 32-bit system). But we're saved again by core.packedGitLimit: if we're going to exceed our 256MB limit, we'll close an existing window before we even call mmap(). So it seems unlikely that this feature is actually doing anything useful. And while we don't have reports of it harming anything (probably because it rarely if ever kicks in), it would be nice to simplify the system overall. This patch drops the whole try_to_free system from xmalloc(), as well as the manual pack memory release in xmmap(). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-13 12:21:33 -07:00
Junio C Hamano	dea6737bb7	Merge branch 'ds/close-object-store' into maint The commit-graph file is now part of the "files that the runtime may keep open file descriptors on, all of which would need to be closed when done with the object store", and the file descriptor to an existing commit-graph file now is closed before "gc" finalizes a new instance to replace it. * ds/close-object-store: packfile: rename close_all_packs to close_object_store packfile: close commit-graph in close_all_packs commit-graph: use raw_object_store when closing commit-graph: extract write_commit_graph_file() commit-graph: extract copy_oids_to_commits() commit-graph: extract count_distinct_commits() commit-graph: extract fill_oids_from_all_packs() commit-graph: extract fill_oids_from_commit_hex() commit-graph: extract fill_oids_from_packs() commit-graph: create write_commit_graph_context commit-graph: remove Future Work section commit-graph: collapse parameters into flags commit-graph: return with errors during write commit-graph: fix the_repository reference	2019-07-29 12:38:22 -07:00
Junio C Hamano	90334a8497	Merge branch 'rs/copy-array' into maint Code clean-up. * rs/copy-array: use COPY_ARRAY for copying arrays coccinelle: use COPY_ARRAY for copying arrays	2019-07-29 12:38:15 -07:00
Junio C Hamano	dae29547c9	Merge branch 'mh/import-transport-fd-fix' into maint The ownership rule for the file descriptor to fast-import remote backend was mixed up, leading to unrelated file descriptor getting closed, which has been fixed. * mh/import-transport-fd-fix: Use xmmap_gently instead of xmmap in use_pack dup() the input fd for fast-import used for remote helpers	2019-07-25 14:27:07 -07:00
Junio C Hamano	4308d81d45	Merge branch 'ds/midx-expire-repack' "git multi-pack-index" learned expire and repack subcommands. * ds/midx-expire-repack: t5319: use 'test-tool path-utils' instead of 'ls -l' t5319-multi-pack-index.sh: test batch size zero midx: add test that 'expire' respects .keep files multi-pack-index: test expire while adding packs midx: implement midx_repack() multi-pack-index: prepare 'repack' subcommand multi-pack-index: implement 'expire' subcommand midx: refactor permutation logic and pack sorting midx: simplify computation of pack name lengths multi-pack-index: prepare for 'expire' subcommand Docs: rearrange subcommands for multi-pack-index repack: refactor pack deletion for future use	2019-07-19 11:30:19 -07:00
Junio C Hamano	e8d2590641	Merge branch 'rs/copy-array' Code clean-up. * rs/copy-array: use COPY_ARRAY for copying arrays coccinelle: use COPY_ARRAY for copying arrays	2019-07-09 15:25:38 -07:00
Junio C Hamano	5cb7c73589	Merge branch 'ds/close-object-store' The commit-graph file is now part of the "files that the runtime may keep open file descriptors on, all of which would need to be closed when done with the object store", and the file descriptor to an existing commit-graph file now is closed before "gc" finalizes a new instance to replace it. * ds/close-object-store: packfile: rename close_all_packs to close_object_store packfile: close commit-graph in close_all_packs commit-graph: use raw_object_store when closing	2019-07-09 15:25:37 -07:00
Christian Couder	b14ed5adaf	Use promisor_remote_get_direct() and has_promisor_remote() Instead of using the repository_format_partial_clone global and fetch_objects() directly, let's use has_promisor_remote() and promisor_remote_get_direct(). This way all the configured promisor remotes will be taken into account, not only the one specified by extensions.partialClone. Also when cloning or fetching using a partial clone filter, remote.origin.promisor will be set to "true" instead of setting extensions.partialClone to "origin". This makes it possible to use many promisor remote just by fetching from them. Signed-off-by: Christian Couder <chriscool@tuxfamily.org> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-25 14:05:37 -07:00
René Scharfe	921d49be86	use COPY_ARRAY for copying arrays Convert calls of memcpy(3) to use COPY_ARRAY, which shortens and simplifies the code a bit. Patch generated by Coccinelle and contrib/coccinelle/array.cocci. Signed-off-by: Rene Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-17 18:15:04 -07:00
Junio C Hamano	2a983b227d	Merge branch 'mh/import-transport-fd-fix' The ownership rule for the file descriptor to fast-import remote backend was mixed up, leading to unrelated file descriptor getting closed, which has been fixed. * mh/import-transport-fd-fix: Use xmmap_gently instead of xmmap in use_pack dup() the input fd for fast-import used for remote helpers	2019-06-13 13:19:43 -07:00

1 2 3 4

189 Commits