mirror of
https://github.com/git/git.git
synced 2024-11-18 02:53:55 +01:00
7dbabbbebe
Since 898b14c (pack-objects: rework check_delta_limit usage, 2007-04-16), we check the delta depth limit only when figuring out whether we should make a new delta. We don't consider it at all when reusing deltas, which means that packing once with --depth=250, and then again with --depth=50, the second pack may still contain chains larger than 50. This is generally considered a feature, as the results of earlier high-depth repacks are carried forward, used for serving fetches, etc. However, since we started using cross-pack deltas in c9af708b1 (pack-objects: use mru list when iterating over packs, 2016-08-11), we are no longer bounded by the length of an existing delta chain in a single pack. Here's one particular pathological case: a sequence of N packs, each with 2 objects, the base of which is stored as a delta in a previous pack. If we chain all the deltas together, we have a cycle of length N. We break the cycle, but the tip delta is still at depth N-1. This is less unlikely than it might sound. See the included test for a reconstruction based on real-world actions. I ran into such a case in the wild, where a client was rapidly sending packs, and we had accumulated 10,000 before doing a server-side repack. The pack that "git repack" tried to generate had a very deep chain, which caused pack-objects to run out of stack space in the recursive write_one(). This patch bounds the length of delta chains in the output pack based on --depth, regardless of whether they are caused by cross-pack deltas or existed in the input packs. This fixes the problem, but does have two possible downsides: 1. High-depth aggressive repacks followed by "normal" repacks will throw away the high-depth chains. In the long run this is probably OK; investigation showed that high-depth repacks aren't actually beneficial, and we dropped the aggressive depth default to match the normal case in 07e7dbf0d (gc: default aggressive depth to 50, 2016-08-11). 2. If you really do want to store high-depth deltas on disk, they may be discarded and new delta computed when serving a fetch, unless you set pack.depth to match your high-depth size. The implementation uses the existing search for delta cycles. That lets us compute the depth of any node based on the depth of its base, because we know the base is DFS_DONE by the time we look at it (modulo any cycles in the graph, but we know there cannot be any because we break them as we see them). There is some subtlety worth mentioning, though. We record the depth of each object as we compute it. It might seem like we could save the per-object storage space by just keeping track of the depth of our traversal (i.e., have break_delta_chains() report how deep it went). But we may visit an object through multiple delta paths, and on subsequent paths we want to know its depth immediately, without having to walk back down to its final base (doing so would make our graph walk quadratic rather than linear). Likewise, one could try to record the depth not from the base, but from our starting point (i.e., start recursion_depth at 0, and pass "recursion_depth + 1" to each invocation of break_delta_chains()). And then when recursion_depth gets too big, we know that we must cut the delta chain. But that technique is wrong if we do not visit the nodes in topological order. In a chain A->B->C, it if we visit "C", then "B", then "A", we will never recurse deeper than 1 link (because we see at each node that we have already visited it). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
82 lines
2.2 KiB
C
82 lines
2.2 KiB
C
#ifndef PACK_OBJECTS_H
|
|
#define PACK_OBJECTS_H
|
|
|
|
struct object_entry {
|
|
struct pack_idx_entry idx;
|
|
unsigned long size; /* uncompressed size */
|
|
struct packed_git *in_pack; /* already in pack */
|
|
off_t in_pack_offset;
|
|
struct object_entry *delta; /* delta base object */
|
|
struct object_entry *delta_child; /* deltified objects who bases me */
|
|
struct object_entry *delta_sibling; /* other deltified objects who
|
|
* uses the same base as me
|
|
*/
|
|
void *delta_data; /* cached delta (uncompressed) */
|
|
unsigned long delta_size; /* delta data size (uncompressed) */
|
|
unsigned long z_delta_size; /* delta data size (compressed) */
|
|
enum object_type type;
|
|
enum object_type in_pack_type; /* could be delta */
|
|
uint32_t hash; /* name hint hash */
|
|
unsigned int in_pack_pos;
|
|
unsigned char in_pack_header_size;
|
|
unsigned preferred_base:1; /*
|
|
* we do not pack this, but is available
|
|
* to be used as the base object to delta
|
|
* objects against.
|
|
*/
|
|
unsigned no_try_delta:1;
|
|
unsigned tagged:1; /* near the very tip of refs */
|
|
unsigned filled:1; /* assigned write-order */
|
|
|
|
/*
|
|
* State flags for depth-first search used for analyzing delta cycles.
|
|
*
|
|
* The depth is measured in delta-links to the base (so if A is a delta
|
|
* against B, then A has a depth of 1, and B a depth of 0).
|
|
*/
|
|
enum {
|
|
DFS_NONE = 0,
|
|
DFS_ACTIVE,
|
|
DFS_DONE
|
|
} dfs_state;
|
|
int depth;
|
|
};
|
|
|
|
struct packing_data {
|
|
struct object_entry *objects;
|
|
uint32_t nr_objects, nr_alloc;
|
|
|
|
int32_t *index;
|
|
uint32_t index_size;
|
|
};
|
|
|
|
struct object_entry *packlist_alloc(struct packing_data *pdata,
|
|
const unsigned char *sha1,
|
|
uint32_t index_pos);
|
|
|
|
struct object_entry *packlist_find(struct packing_data *pdata,
|
|
const unsigned char *sha1,
|
|
uint32_t *index_pos);
|
|
|
|
static inline uint32_t pack_name_hash(const char *name)
|
|
{
|
|
uint32_t c, hash = 0;
|
|
|
|
if (!name)
|
|
return 0;
|
|
|
|
/*
|
|
* This effectively just creates a sortable number from the
|
|
* last sixteen non-whitespace characters. Last characters
|
|
* count "most", so things that end in ".c" sort together.
|
|
*/
|
|
while ((c = *name++) != 0) {
|
|
if (isspace(c))
|
|
continue;
|
|
hash = (hash >> 2) + (c << 24);
|
|
}
|
|
return hash;
|
|
}
|
|
|
|
#endif
|