1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-03-28 22:40:01 +01:00

Merge branch 'ds/chunked-file-api' into next

The common code to deal with "chunked file format" that is shared
by the multi-pack-index and commit-graph files have been factored
out, to help codepaths for both filetypes to become more robust.

* ds/chunked-file-api:
  chunk-format: add technical docs
  chunk-format: restore duplicate chunk checks
  midx: use 64-bit multiplication for chunk sizes
  midx: use chunk-format read API
  commit-graph: use chunk-format read API
  chunk-format: create read chunk API
  midx: use chunk-format API in write_midx_internal()
  midx: drop chunk progress during write
  midx: return success/failure in chunk write methods
  midx: add num_large_offsets to write_midx_context
  midx: add pack_perm to write_midx_context
  midx: add entries to write_midx_context
  midx: use context in write_midx_pack_names()
  midx: rename pack_info to write_midx_context
  commit-graph: use chunk-format write API
  chunk-format: create chunk format write API
  commit-graph: anonymize data in chunk_write_fn
This commit is contained in:
Junio C Hamano 2021-02-22 17:35:58 -08:00
commit 5f0e28cfb2
10 changed files with 647 additions and 459 deletions

View File

@ -0,0 +1,116 @@
Chunk-based file formats
========================
Some file formats in Git use a common concept of "chunks" to describe
sections of the file. This allows structured access to a large file by
scanning a small "table of contents" for the remaining data. This common
format is used by the `commit-graph` and `multi-pack-index` files. See
link:technical/pack-format.html[the `multi-pack-index` format] and
link:technical/commit-graph-format.html[the `commit-graph` format] for
how they use the chunks to describe structured data.
A chunk-based file format begins with some header information custom to
that format. That header should include enough information to identify
the file type, format version, and number of chunks in the file. From this
information, that file can determine the start of the chunk-based region.
The chunk-based region starts with a table of contents describing where
each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
where C is the number of chunks. Consider the following table:
| Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
|--------------------|------------------------|
| ID[0] | OFFSET[0] |
| ... | ... |
| ID[C] | OFFSET[C] |
| 0x0000 | OFFSET[C+1] |
Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
Each integer is stored in network-byte order.
The chunk identifier `ID[i]` is a label for the data stored within this
fill from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the
size of the `i`th chunk is equal to the difference between `OFFSET[i+1]`
and `OFFSET[i]`. This requires that the chunk data appears contiguously
in the same order as the table of contents.
The final entry in the table of contents must be four zero bytes. This
confirms that the table of contents is ending and provides the offset for
the end of the chunk-based data.
Note: The chunk-based format expects that the file contains _at least_ a
trailing hash after `OFFSET[C+1]`.
Functions for working with chunk-based file formats are declared in
`chunk-format.h`. Using these methods provide extra checks that assist
developers when creating new file formats.
Writing chunk-based file formats
--------------------------------
To write a chunk-based file format, create a `struct chunkfile` by
calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
caller is responsible for opening the `hashfile` and writing header
information so the file format is identifiable before the chunk-based
format begins.
Then, call `add_chunk()` for each chunk that is intended for write. This
populates the `chunkfile` with information about the order and size of
each chunk to write. Provide a `chunk_write_fn` function pointer to
perform the write of the chunk data upon request.
Call `write_chunkfile()` to write the table of contents to the `hashfile`
followed by each of the chunks. This will verify that each chunk wrote
the expected amount of data so the table of contents is correct.
Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The
caller is responsible for finalizing the `hashfile` by writing the trailing
hash and closing the file.
Reading chunk-based file formats
--------------------------------
To read a chunk-based file format, the file must be opened as a
memory-mapped region. The chunk-format API expects that the entire file
is mapped as a contiguous memory region.
Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`.
After reading the header information from the beginning of the file,
including the chunk count, call `read_table_of_contents()` to populate
the `struct chunkfile` with the list of chunks, their offsets, and their
sizes.
Extract the data information for each chunk using `pair_chunk()` or
`read_chunk()`:
* `pair_chunk()` assigns a given pointer with the location inside the
memory-mapped file corresponding to that chunk's offset. If the chunk
does not exist, then the pointer is not modified.
* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
with the appropriate initial pointer and size information. The function
is not called if the chunk does not exist. Use this method to read chunks
if you need to perform immediate parsing or if you need to execute logic
based on the size of the chunk.
After calling these methods, call `free_chunkfile()` to clear the
`struct chunkfile` data. This will not close the memory-mapped region.
Callers are expected to own that data for the timeframe the pointers into
the region are needed.
Examples
--------
These file formats use the chunk-format API, and can be used as examples
for future formats:
* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()`
in `commit-graph.c` for how the chunk-format API is used to write and
parse the commit-graph file format documented in
link:technical/commit-graph-format.html[the commit-graph file format].
* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()`
in `midx.c` for how the chunk-format API is used to write and
parse the multi-pack-index file format documented in
link:technical/pack-format.html[the multi-pack-index file format].

View File

@ -61,6 +61,9 @@ CHUNK LOOKUP:
the length using the next chunk position if necessary.) Each chunk
ID appears at most once.
The CHUNK LOOKUP matches the table of contents from
link:technical/chunk-format.html[the chunk-based file format].
The remaining data in the body is described one chunk at a time, and
these chunks may be given in any order. Chunks are required unless
otherwise specified.

View File

@ -336,6 +336,9 @@ CHUNK LOOKUP:
(Chunks are provided in file-order, so you can infer the length
using the next chunk position if necessary.)
The CHUNK LOOKUP matches the table of contents from
link:technical/chunk-format.html[the chunk-based file format].
The remaining data in the body is described one chunk at a time, and
these chunks may be given in any order. Chunks are required unless
otherwise specified.

View File

@ -834,6 +834,7 @@ LIB_OBJS += bundle.o
LIB_OBJS += cache-tree.o
LIB_OBJS += chdir-notify.o
LIB_OBJS += checkout.o
LIB_OBJS += chunk-format.o
LIB_OBJS += color.o
LIB_OBJS += column.o
LIB_OBJS += combine-diff.o

179
chunk-format.c Normal file
View File

@ -0,0 +1,179 @@
#include "cache.h"
#include "chunk-format.h"
#include "csum-file.h"
/*
* When writing a chunk-based file format, collect the chunks in
* an array of chunk_info structs. The size stores the _expected_
* amount of data that will be written by write_fn.
*/
struct chunk_info {
uint32_t id;
uint64_t size;
chunk_write_fn write_fn;
const void *start;
};
struct chunkfile {
struct hashfile *f;
struct chunk_info *chunks;
size_t chunks_nr;
size_t chunks_alloc;
};
struct chunkfile *init_chunkfile(struct hashfile *f)
{
struct chunkfile *cf = xcalloc(1, sizeof(*cf));
cf->f = f;
return cf;
}
void free_chunkfile(struct chunkfile *cf)
{
if (!cf)
return;
free(cf->chunks);
free(cf);
}
int get_num_chunks(struct chunkfile *cf)
{
return cf->chunks_nr;
}
void add_chunk(struct chunkfile *cf,
uint32_t id,
size_t size,
chunk_write_fn fn)
{
ALLOC_GROW(cf->chunks, cf->chunks_nr + 1, cf->chunks_alloc);
cf->chunks[cf->chunks_nr].id = id;
cf->chunks[cf->chunks_nr].write_fn = fn;
cf->chunks[cf->chunks_nr].size = size;
cf->chunks_nr++;
}
int write_chunkfile(struct chunkfile *cf, void *data)
{
int i;
uint64_t cur_offset = hashfile_total(cf->f);
/* Add the table of contents to the current offset */
cur_offset += (cf->chunks_nr + 1) * CHUNK_TOC_ENTRY_SIZE;
for (i = 0; i < cf->chunks_nr; i++) {
hashwrite_be32(cf->f, cf->chunks[i].id);
hashwrite_be64(cf->f, cur_offset);
cur_offset += cf->chunks[i].size;
}
/* Trailing entry marks the end of the chunks */
hashwrite_be32(cf->f, 0);
hashwrite_be64(cf->f, cur_offset);
for (i = 0; i < cf->chunks_nr; i++) {
off_t start_offset = hashfile_total(cf->f);
int result = cf->chunks[i].write_fn(cf->f, data);
if (result)
return result;
if (hashfile_total(cf->f) - start_offset != cf->chunks[i].size)
BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead",
cf->chunks[i].size, cf->chunks[i].id,
hashfile_total(cf->f) - start_offset);
}
return 0;
}
int read_table_of_contents(struct chunkfile *cf,
const unsigned char *mfile,
size_t mfile_size,
uint64_t toc_offset,
int toc_length)
{
int i;
uint32_t chunk_id;
const unsigned char *table_of_contents = mfile + toc_offset;
ALLOC_GROW(cf->chunks, toc_length, cf->chunks_alloc);
while (toc_length--) {
uint64_t chunk_offset, next_chunk_offset;
chunk_id = get_be32(table_of_contents);
chunk_offset = get_be64(table_of_contents + 4);
if (!chunk_id) {
error(_("terminating chunk id appears earlier than expected"));
return 1;
}
table_of_contents += CHUNK_TOC_ENTRY_SIZE;
next_chunk_offset = get_be64(table_of_contents + 4);
if (next_chunk_offset < chunk_offset ||
next_chunk_offset > mfile_size - the_hash_algo->rawsz) {
error(_("improper chunk offset(s) %"PRIx64" and %"PRIx64""),
chunk_offset, next_chunk_offset);
return -1;
}
for (i = 0; i < cf->chunks_nr; i++) {
if (cf->chunks[i].id == chunk_id) {
error(_("duplicate chunk ID %"PRIx32" found"),
chunk_id);
return -1;
}
}
cf->chunks[cf->chunks_nr].id = chunk_id;
cf->chunks[cf->chunks_nr].start = mfile + chunk_offset;
cf->chunks[cf->chunks_nr].size = next_chunk_offset - chunk_offset;
cf->chunks_nr++;
}
chunk_id = get_be32(table_of_contents);
if (chunk_id) {
error(_("final chunk has non-zero id %"PRIx32""), chunk_id);
return -1;
}
return 0;
}
static int pair_chunk_fn(const unsigned char *chunk_start,
size_t chunk_size,
void *data)
{
const unsigned char **p = data;
*p = chunk_start;
return 0;
}
int pair_chunk(struct chunkfile *cf,
uint32_t chunk_id,
const unsigned char **p)
{
return read_chunk(cf, chunk_id, pair_chunk_fn, p);
}
int read_chunk(struct chunkfile *cf,
uint32_t chunk_id,
chunk_read_fn fn,
void *data)
{
int i;
for (i = 0; i < cf->chunks_nr; i++) {
if (cf->chunks[i].id == chunk_id)
return fn(cf->chunks[i].start, cf->chunks[i].size, data);
}
return CHUNK_NOT_FOUND;
}

68
chunk-format.h Normal file
View File

@ -0,0 +1,68 @@
#ifndef CHUNK_FORMAT_H
#define CHUNK_FORMAT_H
#include "git-compat-util.h"
struct hashfile;
struct chunkfile;
#define CHUNK_TOC_ENTRY_SIZE (sizeof(uint32_t) + sizeof(uint64_t))
/*
* Initialize a 'struct chunkfile' for writing _or_ reading a file
* with the chunk format.
*
* If writing a file, supply a non-NULL 'struct hashfile *' that will
* be used to write.
*
* If reading a file, use a NULL 'struct hashfile *' and then call
* read_table_of_contents(). Supply the memory-mapped data to the
* pair_chunk() or read_chunk() methods, as appropriate.
*
* DO NOT MIX THESE MODES. Use different 'struct chunkfile' instances
* for reading and writing.
*/
struct chunkfile *init_chunkfile(struct hashfile *f);
void free_chunkfile(struct chunkfile *cf);
int get_num_chunks(struct chunkfile *cf);
typedef int (*chunk_write_fn)(struct hashfile *f, void *data);
void add_chunk(struct chunkfile *cf,
uint32_t id,
size_t size,
chunk_write_fn fn);
int write_chunkfile(struct chunkfile *cf, void *data);
int read_table_of_contents(struct chunkfile *cf,
const unsigned char *mfile,
size_t mfile_size,
uint64_t toc_offset,
int toc_length);
#define CHUNK_NOT_FOUND (-2)
/*
* Find 'chunk_id' in the given chunkfile and assign the
* given pointer to the position in the mmap'd file where
* that chunk begins.
*
* Returns CHUNK_NOT_FOUND if the chunk does not exist.
*/
int pair_chunk(struct chunkfile *cf,
uint32_t chunk_id,
const unsigned char **p);
typedef int (*chunk_read_fn)(const unsigned char *chunk_start,
size_t chunk_size, void *data);
/*
* Find 'chunk_id' in the given chunkfile and call the
* given chunk_read_fn method with the information for
* that chunk.
*
* Returns CHUNK_NOT_FOUND if the chunk does not exist.
*/
int read_chunk(struct chunkfile *cf,
uint32_t chunk_id,
chunk_read_fn fn,
void *data);
#endif

View File

@ -19,6 +19,7 @@
#include "shallow.h"
#include "json-writer.h"
#include "trace2.h"
#include "chunk-format.h"
void git_test_write_commit_graph_or_die(void)
{
@ -44,7 +45,6 @@ void git_test_write_commit_graph_or_die(void)
#define GRAPH_CHUNKID_BLOOMINDEXES 0x42494458 /* "BIDX" */
#define GRAPH_CHUNKID_BLOOMDATA 0x42444154 /* "BDAT" */
#define GRAPH_CHUNKID_BASE 0x42415345 /* "BASE" */
#define MAX_NUM_CHUNKS 9
#define GRAPH_DATA_WIDTH (the_hash_algo->rawsz + 16)
@ -59,8 +59,7 @@ void git_test_write_commit_graph_or_die(void)
#define GRAPH_HEADER_SIZE 8
#define GRAPH_FANOUT_SIZE (4 * 256)
#define GRAPH_CHUNKLOOKUP_WIDTH 12
#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * GRAPH_CHUNKLOOKUP_WIDTH \
#define GRAPH_MIN_SIZE (GRAPH_HEADER_SIZE + 4 * CHUNK_TOC_ENTRY_SIZE \
+ GRAPH_FANOUT_SIZE + the_hash_algo->rawsz)
#define CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW (1ULL << 31)
@ -306,15 +305,43 @@ static int verify_commit_graph_lite(struct commit_graph *g)
return 0;
}
static int graph_read_oid_lookup(const unsigned char *chunk_start,
size_t chunk_size, void *data)
{
struct commit_graph *g = data;
g->chunk_oid_lookup = chunk_start;
g->num_commits = chunk_size / g->hash_len;
return 0;
}
static int graph_read_bloom_data(const unsigned char *chunk_start,
size_t chunk_size, void *data)
{
struct commit_graph *g = data;
uint32_t hash_version;
g->chunk_bloom_data = chunk_start;
hash_version = get_be32(chunk_start);
if (hash_version != 1)
return 0;
g->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
g->bloom_filter_settings->hash_version = hash_version;
g->bloom_filter_settings->num_hashes = get_be32(chunk_start + 4);
g->bloom_filter_settings->bits_per_entry = get_be32(chunk_start + 8);
g->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES;
return 0;
}
struct commit_graph *parse_commit_graph(struct repository *r,
void *graph_map, size_t graph_size)
{
const unsigned char *data, *chunk_lookup;
uint32_t i;
const unsigned char *data;
struct commit_graph *graph;
uint64_t next_chunk_offset;
uint32_t graph_signature;
unsigned char graph_version, hash_version;
struct chunkfile *cf = NULL;
if (!graph_map)
return NULL;
@ -355,7 +382,7 @@ struct commit_graph *parse_commit_graph(struct repository *r,
graph->data_len = graph_size;
if (graph_size < GRAPH_HEADER_SIZE +
(graph->num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH +
(graph->num_chunks + 1) * CHUNK_TOC_ENTRY_SIZE +
GRAPH_FANOUT_SIZE + the_hash_algo->rawsz) {
error(_("commit-graph file is too small to hold %u chunks"),
graph->num_chunks);
@ -363,108 +390,28 @@ struct commit_graph *parse_commit_graph(struct repository *r,
return NULL;
}
chunk_lookup = data + 8;
next_chunk_offset = get_be64(chunk_lookup + 4);
for (i = 0; i < graph->num_chunks; i++) {
uint32_t chunk_id;
uint64_t chunk_offset = next_chunk_offset;
int chunk_repeated = 0;
cf = init_chunkfile(NULL);
chunk_id = get_be32(chunk_lookup + 0);
if (read_table_of_contents(cf, graph->data, graph_size,
GRAPH_HEADER_SIZE, graph->num_chunks))
goto free_and_return;
chunk_lookup += GRAPH_CHUNKLOOKUP_WIDTH;
next_chunk_offset = get_be64(chunk_lookup + 4);
pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT,
(const unsigned char **)&graph->chunk_oid_fanout);
read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph);
pair_chunk(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data);
pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges);
pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
&graph->chunk_generation_data);
pair_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
&graph->chunk_generation_data_overflow);
if (chunk_offset > graph_size - the_hash_algo->rawsz) {
error(_("commit-graph improper chunk offset %08x%08x"), (uint32_t)(chunk_offset >> 32),
(uint32_t)chunk_offset);
goto free_and_return;
}
switch (chunk_id) {
case GRAPH_CHUNKID_OIDFANOUT:
if (graph->chunk_oid_fanout)
chunk_repeated = 1;
else
graph->chunk_oid_fanout = (uint32_t*)(data + chunk_offset);
break;
case GRAPH_CHUNKID_OIDLOOKUP:
if (graph->chunk_oid_lookup)
chunk_repeated = 1;
else {
graph->chunk_oid_lookup = data + chunk_offset;
graph->num_commits = (next_chunk_offset - chunk_offset)
/ graph->hash_len;
}
break;
case GRAPH_CHUNKID_DATA:
if (graph->chunk_commit_data)
chunk_repeated = 1;
else
graph->chunk_commit_data = data + chunk_offset;
break;
case GRAPH_CHUNKID_GENERATION_DATA:
if (graph->chunk_generation_data)
chunk_repeated = 1;
else
graph->chunk_generation_data = data + chunk_offset;
break;
case GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW:
if (graph->chunk_generation_data_overflow)
chunk_repeated = 1;
else
graph->chunk_generation_data_overflow = data + chunk_offset;
break;
case GRAPH_CHUNKID_EXTRAEDGES:
if (graph->chunk_extra_edges)
chunk_repeated = 1;
else
graph->chunk_extra_edges = data + chunk_offset;
break;
case GRAPH_CHUNKID_BASE:
if (graph->chunk_base_graphs)
chunk_repeated = 1;
else
graph->chunk_base_graphs = data + chunk_offset;
break;
case GRAPH_CHUNKID_BLOOMINDEXES:
if (graph->chunk_bloom_indexes)
chunk_repeated = 1;
else if (r->settings.commit_graph_read_changed_paths)
graph->chunk_bloom_indexes = data + chunk_offset;
break;
case GRAPH_CHUNKID_BLOOMDATA:
if (graph->chunk_bloom_data)
chunk_repeated = 1;
else if (r->settings.commit_graph_read_changed_paths) {
uint32_t hash_version;
graph->chunk_bloom_data = data + chunk_offset;
hash_version = get_be32(data + chunk_offset);
if (hash_version != 1)
break;
graph->bloom_filter_settings = xmalloc(sizeof(struct bloom_filter_settings));
graph->bloom_filter_settings->hash_version = hash_version;
graph->bloom_filter_settings->num_hashes = get_be32(data + chunk_offset + 4);
graph->bloom_filter_settings->bits_per_entry = get_be32(data + chunk_offset + 8);
graph->bloom_filter_settings->max_changed_paths = DEFAULT_BLOOM_MAX_CHANGES;
}
break;
}
if (chunk_repeated) {
error(_("commit-graph chunk id %08x appears multiple times"), chunk_id);
goto free_and_return;
}
if (r->settings.commit_graph_read_changed_paths) {
pair_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
&graph->chunk_bloom_indexes);
read_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
graph_read_bloom_data, graph);
}
if (graph->chunk_bloom_indexes && graph->chunk_bloom_data) {
@ -481,9 +428,11 @@ struct commit_graph *parse_commit_graph(struct repository *r,
if (verify_commit_graph_lite(graph))
goto free_and_return;
free_chunkfile(cf);
return graph;
free_and_return:
free_chunkfile(cf);
free(graph->bloom_filter_settings);
free(graph);
return NULL;
@ -1059,8 +1008,9 @@ struct write_commit_graph_context {
};
static int write_graph_chunk_fanout(struct hashfile *f,
struct write_commit_graph_context *ctx)
void *data)
{
struct write_commit_graph_context *ctx = data;
int i, count = 0;
struct commit **list = ctx->commits.list;
@ -1085,8 +1035,9 @@ static int write_graph_chunk_fanout(struct hashfile *f,
}
static int write_graph_chunk_oids(struct hashfile *f,
struct write_commit_graph_context *ctx)
void *data)
{
struct write_commit_graph_context *ctx = data;
struct commit **list = ctx->commits.list;
int count;
for (count = 0; count < ctx->commits.nr; count++, list++) {
@ -1104,8 +1055,9 @@ static const struct object_id *commit_to_oid(size_t index, const void *table)
}
static int write_graph_chunk_data(struct hashfile *f,
struct write_commit_graph_context *ctx)
void *data)
{
struct write_commit_graph_context *ctx = data;
struct commit **list = ctx->commits.list;
struct commit **last = ctx->commits.list + ctx->commits.nr;
uint32_t num_extra_edges = 0;
@ -1206,8 +1158,9 @@ static int write_graph_chunk_data(struct hashfile *f,
}
static int write_graph_chunk_generation_data(struct hashfile *f,
struct write_commit_graph_context *ctx)
void *data)
{
struct write_commit_graph_context *ctx = data;
int i, num_generation_data_overflows = 0;
for (i = 0; i < ctx->commits.nr; i++) {
@ -1229,8 +1182,9 @@ static int write_graph_chunk_generation_data(struct hashfile *f,
}
static int write_graph_chunk_generation_data_overflow(struct hashfile *f,
struct write_commit_graph_context *ctx)
void *data)
{
struct write_commit_graph_context *ctx = data;
int i;
for (i = 0; i < ctx->commits.nr; i++) {
struct commit *c = ctx->commits.list[i];
@ -1247,8 +1201,9 @@ static int write_graph_chunk_generation_data_overflow(struct hashfile *f,
}
static int write_graph_chunk_extra_edges(struct hashfile *f,
struct write_commit_graph_context *ctx)
void *data)
{
struct write_commit_graph_context *ctx = data;
struct commit **list = ctx->commits.list;
struct commit **last = ctx->commits.list + ctx->commits.nr;
struct commit_list *parent;
@ -1301,8 +1256,9 @@ static int write_graph_chunk_extra_edges(struct hashfile *f,
}
static int write_graph_chunk_bloom_indexes(struct hashfile *f,
struct write_commit_graph_context *ctx)
void *data)
{
struct write_commit_graph_context *ctx = data;
struct commit **list = ctx->commits.list;
struct commit **last = ctx->commits.list + ctx->commits.nr;
uint32_t cur_pos = 0;
@ -1336,8 +1292,9 @@ static void trace2_bloom_filter_settings(struct write_commit_graph_context *ctx)
}
static int write_graph_chunk_bloom_data(struct hashfile *f,
struct write_commit_graph_context *ctx)
void *data)
{
struct write_commit_graph_context *ctx = data;
struct commit **list = ctx->commits.list;
struct commit **last = ctx->commits.list + ctx->commits.nr;
@ -1813,8 +1770,9 @@ static int write_graph_chunk_base_1(struct hashfile *f,
}
static int write_graph_chunk_base(struct hashfile *f,
struct write_commit_graph_context *ctx)
void *data)
{
struct write_commit_graph_context *ctx = data;
int num = write_graph_chunk_base_1(f, ctx->new_base_graph);
if (num != ctx->num_commit_graphs_after - 1) {
@ -1825,27 +1783,17 @@ static int write_graph_chunk_base(struct hashfile *f,
return 0;
}
typedef int (*chunk_write_fn)(struct hashfile *f,
struct write_commit_graph_context *ctx);
struct chunk_info {
uint32_t id;
uint64_t size;
chunk_write_fn write_fn;
};
static int write_commit_graph_file(struct write_commit_graph_context *ctx)
{
uint32_t i;
int fd;
struct hashfile *f;
struct lock_file lk = LOCK_INIT;
struct chunk_info chunks[MAX_NUM_CHUNKS + 1];
const unsigned hashsz = the_hash_algo->rawsz;
struct strbuf progress_title = STRBUF_INIT;
int num_chunks = 3;
uint64_t chunk_offset;
struct object_id file_hash;
struct chunkfile *cf;
if (ctx->split) {
struct strbuf tmp_file = STRBUF_INIT;
@ -1891,76 +1839,50 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
f = hashfd(fd, get_lock_file_path(&lk));
}
chunks[0].id = GRAPH_CHUNKID_OIDFANOUT;
chunks[0].size = GRAPH_FANOUT_SIZE;
chunks[0].write_fn = write_graph_chunk_fanout;
chunks[1].id = GRAPH_CHUNKID_OIDLOOKUP;
chunks[1].size = hashsz * ctx->commits.nr;
chunks[1].write_fn = write_graph_chunk_oids;
chunks[2].id = GRAPH_CHUNKID_DATA;
chunks[2].size = (hashsz + 16) * ctx->commits.nr;
chunks[2].write_fn = write_graph_chunk_data;
cf = init_chunkfile(f);
add_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, GRAPH_FANOUT_SIZE,
write_graph_chunk_fanout);
add_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, hashsz * ctx->commits.nr,
write_graph_chunk_oids);
add_chunk(cf, GRAPH_CHUNKID_DATA, (hashsz + 16) * ctx->commits.nr,
write_graph_chunk_data);
if (git_env_bool(GIT_TEST_COMMIT_GRAPH_NO_GDAT, 0))
ctx->write_generation_data = 0;
if (ctx->write_generation_data) {
chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA;
chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr;
chunks[num_chunks].write_fn = write_graph_chunk_generation_data;
num_chunks++;
}
if (ctx->num_generation_data_overflows) {
chunks[num_chunks].id = GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW;
chunks[num_chunks].size = sizeof(timestamp_t) * ctx->num_generation_data_overflows;
chunks[num_chunks].write_fn = write_graph_chunk_generation_data_overflow;
num_chunks++;
}
if (ctx->num_extra_edges) {
chunks[num_chunks].id = GRAPH_CHUNKID_EXTRAEDGES;
chunks[num_chunks].size = 4 * ctx->num_extra_edges;
chunks[num_chunks].write_fn = write_graph_chunk_extra_edges;
num_chunks++;
}
if (ctx->write_generation_data)
add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA,
sizeof(uint32_t) * ctx->commits.nr,
write_graph_chunk_generation_data);
if (ctx->num_generation_data_overflows)
add_chunk(cf, GRAPH_CHUNKID_GENERATION_DATA_OVERFLOW,
sizeof(timestamp_t) * ctx->num_generation_data_overflows,
write_graph_chunk_generation_data_overflow);
if (ctx->num_extra_edges)
add_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES,
4 * ctx->num_extra_edges,
write_graph_chunk_extra_edges);
if (ctx->changed_paths) {
chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMINDEXES;
chunks[num_chunks].size = sizeof(uint32_t) * ctx->commits.nr;
chunks[num_chunks].write_fn = write_graph_chunk_bloom_indexes;
num_chunks++;
chunks[num_chunks].id = GRAPH_CHUNKID_BLOOMDATA;
chunks[num_chunks].size = sizeof(uint32_t) * 3
+ ctx->total_bloom_filter_data_size;
chunks[num_chunks].write_fn = write_graph_chunk_bloom_data;
num_chunks++;
add_chunk(cf, GRAPH_CHUNKID_BLOOMINDEXES,
sizeof(uint32_t) * ctx->commits.nr,
write_graph_chunk_bloom_indexes);
add_chunk(cf, GRAPH_CHUNKID_BLOOMDATA,
sizeof(uint32_t) * 3
+ ctx->total_bloom_filter_data_size,
write_graph_chunk_bloom_data);
}
if (ctx->num_commit_graphs_after > 1) {
chunks[num_chunks].id = GRAPH_CHUNKID_BASE;
chunks[num_chunks].size = hashsz * (ctx->num_commit_graphs_after - 1);
chunks[num_chunks].write_fn = write_graph_chunk_base;
num_chunks++;
}
chunks[num_chunks].id = 0;
chunks[num_chunks].size = 0;
if (ctx->num_commit_graphs_after > 1)
add_chunk(cf, GRAPH_CHUNKID_BASE,
hashsz * (ctx->num_commit_graphs_after - 1),
write_graph_chunk_base);
hashwrite_be32(f, GRAPH_SIGNATURE);
hashwrite_u8(f, GRAPH_VERSION);
hashwrite_u8(f, oid_version());
hashwrite_u8(f, num_chunks);
hashwrite_u8(f, get_num_chunks(cf));
hashwrite_u8(f, ctx->num_commit_graphs_after - 1);
chunk_offset = 8 + (num_chunks + 1) * GRAPH_CHUNKLOOKUP_WIDTH;
for (i = 0; i <= num_chunks; i++) {
uint32_t chunk_write[3];
chunk_write[0] = htonl(chunks[i].id);
chunk_write[1] = htonl(chunk_offset >> 32);
chunk_write[2] = htonl(chunk_offset & 0xffffffff);
hashwrite(f, chunk_write, 12);
chunk_offset += chunks[i].size;
}
if (ctx->report_progress) {
strbuf_addf(&progress_title,
Q_("Writing out commit graph in %d pass",
@ -1972,17 +1894,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
num_chunks * ctx->commits.nr);
}
for (i = 0; i < num_chunks; i++) {
uint64_t start_offset = f->total + f->offset;
if (chunks[i].write_fn(f, ctx))
return -1;
if (f->total + f->offset != start_offset + chunks[i].size)
BUG("expected to write %"PRId64" bytes to chunk %"PRIx32", but wrote %"PRId64" instead",
chunks[i].size, chunks[i].id,
f->total + f->offset - start_offset);
}
write_chunkfile(cf, ctx);
stop_progress(&ctx->progress);
strbuf_release(&progress_title);
@ -1999,6 +1911,7 @@ static int write_commit_graph_file(struct write_commit_graph_context *ctx)
close_commit_graph(ctx->r->objects);
finalize_hashfile(f, file_hash.hash, CSUM_HASH_IN_STREAM | CSUM_FSYNC);
free_chunkfile(cf);
if (ctx->split) {
FILE *chainf = fdopen_lock_file(&lk, "w");

425
midx.c
View File

@ -11,6 +11,7 @@
#include "trace2.h"
#include "run-command.h"
#include "repository.h"
#include "chunk-format.h"
#define MIDX_SIGNATURE 0x4d494458 /* "MIDX" */
#define MIDX_VERSION 1
@ -21,14 +22,12 @@
#define MIDX_HEADER_SIZE 12
#define MIDX_MIN_SIZE (MIDX_HEADER_SIZE + the_hash_algo->rawsz)
#define MIDX_MAX_CHUNKS 5
#define MIDX_CHUNK_ALIGNMENT 4
#define MIDX_CHUNKID_PACKNAMES 0x504e414d /* "PNAM" */
#define MIDX_CHUNKID_OIDFANOUT 0x4f494446 /* "OIDF" */
#define MIDX_CHUNKID_OIDLOOKUP 0x4f49444c /* "OIDL" */
#define MIDX_CHUNKID_OBJECTOFFSETS 0x4f4f4646 /* "OOFF" */
#define MIDX_CHUNKID_LARGEOFFSETS 0x4c4f4646 /* "LOFF" */
#define MIDX_CHUNKLOOKUP_WIDTH (sizeof(uint32_t) + sizeof(uint64_t))
#define MIDX_CHUNK_FANOUT_SIZE (sizeof(uint32_t) * 256)
#define MIDX_CHUNK_OFFSET_WIDTH (2 * sizeof(uint32_t))
#define MIDX_CHUNK_LARGE_OFFSET_WIDTH (sizeof(uint64_t))
@ -53,6 +52,19 @@ static char *get_midx_filename(const char *object_dir)
return xstrfmt("%s/pack/multi-pack-index", object_dir);
}
static int midx_read_oid_fanout(const unsigned char *chunk_start,
size_t chunk_size, void *data)
{
struct multi_pack_index *m = data;
m->chunk_oid_fanout = (uint32_t *)chunk_start;
if (chunk_size != 4 * 256) {
error(_("multi-pack-index OID fanout is of the wrong size"));
return 1;
}
return 0;
}
struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local)
{
struct multi_pack_index *m = NULL;
@ -64,6 +76,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
char *midx_name = get_midx_filename(object_dir);
uint32_t i;
const char *cur_pack_name;
struct chunkfile *cf = NULL;
fd = git_open(midx_name);
@ -113,58 +126,23 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
m->num_packs = get_be32(m->data + MIDX_BYTE_NUM_PACKS);
for (i = 0; i < m->num_chunks; i++) {
uint32_t chunk_id = get_be32(m->data + MIDX_HEADER_SIZE +
MIDX_CHUNKLOOKUP_WIDTH * i);
uint64_t chunk_offset = get_be64(m->data + MIDX_HEADER_SIZE + 4 +
MIDX_CHUNKLOOKUP_WIDTH * i);
cf = init_chunkfile(NULL);
if (chunk_offset >= m->data_len)
die(_("invalid chunk offset (too large)"));
if (read_table_of_contents(cf, m->data, midx_size,
MIDX_HEADER_SIZE, m->num_chunks))
goto cleanup_fail;
switch (chunk_id) {
case MIDX_CHUNKID_PACKNAMES:
m->chunk_pack_names = m->data + chunk_offset;
break;
case MIDX_CHUNKID_OIDFANOUT:
m->chunk_oid_fanout = (uint32_t *)(m->data + chunk_offset);
break;
case MIDX_CHUNKID_OIDLOOKUP:
m->chunk_oid_lookup = m->data + chunk_offset;
break;
case MIDX_CHUNKID_OBJECTOFFSETS:
m->chunk_object_offsets = m->data + chunk_offset;
break;
case MIDX_CHUNKID_LARGEOFFSETS:
m->chunk_large_offsets = m->data + chunk_offset;
break;
case 0:
die(_("terminating multi-pack-index chunk id appears earlier than expected"));
break;
default:
/*
* Do nothing on unrecognized chunks, allowing future
* extensions to add optional chunks.
*/
break;
}
}
if (!m->chunk_pack_names)
if (pair_chunk(cf, MIDX_CHUNKID_PACKNAMES, &m->chunk_pack_names) == CHUNK_NOT_FOUND)
die(_("multi-pack-index missing required pack-name chunk"));
if (!m->chunk_oid_fanout)
if (read_chunk(cf, MIDX_CHUNKID_OIDFANOUT, midx_read_oid_fanout, m) == CHUNK_NOT_FOUND)
die(_("multi-pack-index missing required OID fanout chunk"));
if (!m->chunk_oid_lookup)
if (pair_chunk(cf, MIDX_CHUNKID_OIDLOOKUP, &m->chunk_oid_lookup) == CHUNK_NOT_FOUND)
die(_("multi-pack-index missing required OID lookup chunk"));
if (!m->chunk_object_offsets)
if (pair_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS, &m->chunk_object_offsets) == CHUNK_NOT_FOUND)
die(_("multi-pack-index missing required object offsets chunk"));
pair_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS, &m->chunk_large_offsets);
m->num_objects = ntohl(m->chunk_oid_fanout[255]);
m->pack_names = xcalloc(m->num_packs, sizeof(*m->pack_names));
@ -190,6 +168,7 @@ struct multi_pack_index *load_multi_pack_index(const char *object_dir, int local
cleanup_fail:
free(m);
free(midx_name);
free(cf);
if (midx_map)
munmap(midx_map, midx_size);
if (0 <= fd)
@ -265,7 +244,7 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
const unsigned char *offset_data;
uint32_t offset32;
offset_data = m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH;
offset_data = m->chunk_object_offsets + (off_t)pos * MIDX_CHUNK_OFFSET_WIDTH;
offset32 = get_be32(offset_data + sizeof(uint32_t));
if (m->chunk_large_offsets && offset32 & MIDX_LARGE_OFFSET_NEEDED) {
@ -281,7 +260,8 @@ static off_t nth_midxed_offset(struct multi_pack_index *m, uint32_t pos)
static uint32_t nth_midxed_pack_int_id(struct multi_pack_index *m, uint32_t pos)
{
return get_be32(m->chunk_object_offsets + pos * MIDX_CHUNK_OFFSET_WIDTH);
return get_be32(m->chunk_object_offsets +
(off_t)pos * MIDX_CHUNK_OFFSET_WIDTH);
}
static int nth_midxed_pack_entry(struct repository *r,
@ -451,49 +431,56 @@ static int pack_info_compare(const void *_a, const void *_b)
return strcmp(a->pack_name, b->pack_name);
}
struct pack_list {
struct write_midx_context {
struct pack_info *info;
uint32_t nr;
uint32_t alloc;
struct multi_pack_index *m;
struct progress *progress;
unsigned pack_paths_checked;
struct pack_midx_entry *entries;
uint32_t entries_nr;
uint32_t *pack_perm;
unsigned large_offsets_needed:1;
uint32_t num_large_offsets;
};
static void add_pack_to_midx(const char *full_path, size_t full_path_len,
const char *file_name, void *data)
{
struct pack_list *packs = (struct pack_list *)data;
struct write_midx_context *ctx = data;
if (ends_with(file_name, ".idx")) {
display_progress(packs->progress, ++packs->pack_paths_checked);
if (packs->m && midx_contains_pack(packs->m, file_name))
display_progress(ctx->progress, ++ctx->pack_paths_checked);
if (ctx->m && midx_contains_pack(ctx->m, file_name))
return;
ALLOC_GROW(packs->info, packs->nr + 1, packs->alloc);
ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc);
packs->info[packs->nr].p = add_packed_git(full_path,
full_path_len,
0);
ctx->info[ctx->nr].p = add_packed_git(full_path,
full_path_len,
0);
if (!packs->info[packs->nr].p) {
if (!ctx->info[ctx->nr].p) {
warning(_("failed to add packfile '%s'"),
full_path);
return;
}
if (open_pack_index(packs->info[packs->nr].p)) {
if (open_pack_index(ctx->info[ctx->nr].p)) {
warning(_("failed to open pack-index '%s'"),
full_path);
close_pack(packs->info[packs->nr].p);
FREE_AND_NULL(packs->info[packs->nr].p);
close_pack(ctx->info[ctx->nr].p);
FREE_AND_NULL(ctx->info[ctx->nr].p);
return;
}
packs->info[packs->nr].pack_name = xstrdup(file_name);
packs->info[packs->nr].orig_pack_int_id = packs->nr;
packs->info[packs->nr].expired = 0;
packs->nr++;
ctx->info[ctx->nr].pack_name = xstrdup(file_name);
ctx->info[ctx->nr].orig_pack_int_id = ctx->nr;
ctx->info[ctx->nr].expired = 0;
ctx->nr++;
}
}
@ -643,27 +630,26 @@ static struct pack_midx_entry *get_sorted_entries(struct multi_pack_index *m,
return deduplicated_entries;
}
static size_t write_midx_pack_names(struct hashfile *f,
struct pack_info *info,
uint32_t num_packs)
static int write_midx_pack_names(struct hashfile *f, void *data)
{
struct write_midx_context *ctx = data;
uint32_t i;
unsigned char padding[MIDX_CHUNK_ALIGNMENT];
size_t written = 0;
for (i = 0; i < num_packs; i++) {
for (i = 0; i < ctx->nr; i++) {
size_t writelen;
if (info[i].expired)
if (ctx->info[i].expired)
continue;
if (i && strcmp(info[i].pack_name, info[i - 1].pack_name) <= 0)
if (i && strcmp(ctx->info[i].pack_name, ctx->info[i - 1].pack_name) <= 0)
BUG("incorrect pack-file order: %s before %s",
info[i - 1].pack_name,
info[i].pack_name);
ctx->info[i - 1].pack_name,
ctx->info[i].pack_name);
writelen = strlen(info[i].pack_name) + 1;
hashwrite(f, info[i].pack_name, writelen);
writelen = strlen(ctx->info[i].pack_name) + 1;
hashwrite(f, ctx->info[i].pack_name, writelen);
written += writelen;
}
@ -672,18 +658,17 @@ static size_t write_midx_pack_names(struct hashfile *f,
if (i < MIDX_CHUNK_ALIGNMENT) {
memset(padding, 0, sizeof(padding));
hashwrite(f, padding, i);
written += i;
}
return written;
return 0;
}
static size_t write_midx_oid_fanout(struct hashfile *f,
struct pack_midx_entry *objects,
uint32_t nr_objects)
static int write_midx_oid_fanout(struct hashfile *f,
void *data)
{
struct pack_midx_entry *list = objects;
struct pack_midx_entry *last = objects + nr_objects;
struct write_midx_context *ctx = data;
struct pack_midx_entry *list = ctx->entries;
struct pack_midx_entry *last = ctx->entries + ctx->entries_nr;
uint32_t count = 0;
uint32_t i;
@ -704,21 +689,21 @@ static size_t write_midx_oid_fanout(struct hashfile *f,
list = next;
}
return MIDX_CHUNK_FANOUT_SIZE;
return 0;
}
static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
struct pack_midx_entry *objects,
uint32_t nr_objects)
static int write_midx_oid_lookup(struct hashfile *f,
void *data)
{
struct pack_midx_entry *list = objects;
struct write_midx_context *ctx = data;
unsigned char hash_len = the_hash_algo->rawsz;
struct pack_midx_entry *list = ctx->entries;
uint32_t i;
size_t written = 0;
for (i = 0; i < nr_objects; i++) {
for (i = 0; i < ctx->entries_nr; i++) {
struct pack_midx_entry *obj = list++;
if (i < nr_objects - 1) {
if (i < ctx->entries_nr - 1) {
struct pack_midx_entry *next = list;
if (oidcmp(&obj->oid, &next->oid) >= 0)
BUG("OIDs not in order: %s >= %s",
@ -727,50 +712,48 @@ static size_t write_midx_oid_lookup(struct hashfile *f, unsigned char hash_len,
}
hashwrite(f, obj->oid.hash, (int)hash_len);
written += hash_len;
}
return written;
return 0;
}
static size_t write_midx_object_offsets(struct hashfile *f, int large_offset_needed,
uint32_t *perm,
struct pack_midx_entry *objects, uint32_t nr_objects)
static int write_midx_object_offsets(struct hashfile *f,
void *data)
{
struct pack_midx_entry *list = objects;
struct write_midx_context *ctx = data;
struct pack_midx_entry *list = ctx->entries;
uint32_t i, nr_large_offset = 0;
size_t written = 0;
for (i = 0; i < nr_objects; i++) {
for (i = 0; i < ctx->entries_nr; i++) {
struct pack_midx_entry *obj = list++;
if (perm[obj->pack_int_id] == PACK_EXPIRED)
if (ctx->pack_perm[obj->pack_int_id] == PACK_EXPIRED)
BUG("object %s is in an expired pack with int-id %d",
oid_to_hex(&obj->oid),
obj->pack_int_id);
hashwrite_be32(f, perm[obj->pack_int_id]);
hashwrite_be32(f, ctx->pack_perm[obj->pack_int_id]);
if (large_offset_needed && obj->offset >> 31)
if (ctx->large_offsets_needed && obj->offset >> 31)
hashwrite_be32(f, MIDX_LARGE_OFFSET_NEEDED | nr_large_offset++);
else if (!large_offset_needed && obj->offset >> 32)
else if (!ctx->large_offsets_needed && obj->offset >> 32)
BUG("object %s requires a large offset (%"PRIx64") but the MIDX is not writing large offsets!",
oid_to_hex(&obj->oid),
obj->offset);
else
hashwrite_be32(f, (uint32_t)obj->offset);
written += MIDX_CHUNK_OFFSET_WIDTH;
}
return written;
return 0;
}
static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_offset,
struct pack_midx_entry *objects, uint32_t nr_objects)
static int write_midx_large_offsets(struct hashfile *f,
void *data)
{
struct pack_midx_entry *list = objects, *end = objects + nr_objects;
size_t written = 0;
struct write_midx_context *ctx = data;
struct pack_midx_entry *list = ctx->entries;
struct pack_midx_entry *end = ctx->entries + ctx->entries_nr;
uint32_t nr_large_offset = ctx->num_large_offsets;
while (nr_large_offset) {
struct pack_midx_entry *obj;
@ -785,34 +768,26 @@ static size_t write_midx_large_offsets(struct hashfile *f, uint32_t nr_large_off
if (!(offset >> 31))
continue;
written += hashwrite_be64(f, offset);
hashwrite_be64(f, offset);
nr_large_offset--;
}
return written;
return 0;
}
static int write_midx_internal(const char *object_dir, struct multi_pack_index *m,
struct string_list *packs_to_drop, unsigned flags)
{
unsigned char cur_chunk, num_chunks = 0;
char *midx_name;
uint32_t i;
struct hashfile *f = NULL;
struct lock_file lk;
struct pack_list packs;
uint32_t *pack_perm = NULL;
uint64_t written = 0;
uint32_t chunk_ids[MIDX_MAX_CHUNKS + 1];
uint64_t chunk_offsets[MIDX_MAX_CHUNKS + 1];
uint32_t nr_entries, num_large_offsets = 0;
struct pack_midx_entry *entries = NULL;
struct progress *progress = NULL;
int large_offsets_needed = 0;
struct write_midx_context ctx = { 0 };
int pack_name_concat_len = 0;
int dropped_packs = 0;
int result = 0;
struct chunkfile *cf;
midx_name = get_midx_filename(object_dir);
if (safe_create_leading_directories(midx_name))
@ -820,61 +795,62 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
midx_name);
if (m)
packs.m = m;
ctx.m = m;
else
packs.m = load_multi_pack_index(object_dir, 1);
ctx.m = load_multi_pack_index(object_dir, 1);
packs.nr = 0;
packs.alloc = packs.m ? packs.m->num_packs : 16;
packs.info = NULL;
ALLOC_ARRAY(packs.info, packs.alloc);
ctx.nr = 0;
ctx.alloc = ctx.m ? ctx.m->num_packs : 16;
ctx.info = NULL;
ALLOC_ARRAY(ctx.info, ctx.alloc);
if (packs.m) {
for (i = 0; i < packs.m->num_packs; i++) {
ALLOC_GROW(packs.info, packs.nr + 1, packs.alloc);
if (ctx.m) {
for (i = 0; i < ctx.m->num_packs; i++) {
ALLOC_GROW(ctx.info, ctx.nr + 1, ctx.alloc);
packs.info[packs.nr].orig_pack_int_id = i;
packs.info[packs.nr].pack_name = xstrdup(packs.m->pack_names[i]);
packs.info[packs.nr].p = NULL;
packs.info[packs.nr].expired = 0;
packs.nr++;
ctx.info[ctx.nr].orig_pack_int_id = i;
ctx.info[ctx.nr].pack_name = xstrdup(ctx.m->pack_names[i]);
ctx.info[ctx.nr].p = NULL;
ctx.info[ctx.nr].expired = 0;
ctx.nr++;
}
}
packs.pack_paths_checked = 0;
ctx.pack_paths_checked = 0;
if (flags & MIDX_PROGRESS)
packs.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0);
ctx.progress = start_delayed_progress(_("Adding packfiles to multi-pack-index"), 0);
else
packs.progress = NULL;
ctx.progress = NULL;
for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &packs);
stop_progress(&packs.progress);
for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx);
stop_progress(&ctx.progress);
if (packs.m && packs.nr == packs.m->num_packs && !packs_to_drop)
if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop)
goto cleanup;
entries = get_sorted_entries(packs.m, packs.info, packs.nr, &nr_entries);
ctx.entries = get_sorted_entries(ctx.m, ctx.info, ctx.nr, &ctx.entries_nr);
for (i = 0; i < nr_entries; i++) {
if (entries[i].offset > 0x7fffffff)
num_large_offsets++;
if (entries[i].offset > 0xffffffff)
large_offsets_needed = 1;
ctx.large_offsets_needed = 0;
for (i = 0; i < ctx.entries_nr; i++) {
if (ctx.entries[i].offset > 0x7fffffff)
ctx.num_large_offsets++;
if (ctx.entries[i].offset > 0xffffffff)
ctx.large_offsets_needed = 1;
}
QSORT(packs.info, packs.nr, pack_info_compare);
QSORT(ctx.info, ctx.nr, pack_info_compare);
if (packs_to_drop && packs_to_drop->nr) {
int drop_index = 0;
int missing_drops = 0;
for (i = 0; i < packs.nr && drop_index < packs_to_drop->nr; i++) {
int cmp = strcmp(packs.info[i].pack_name,
for (i = 0; i < ctx.nr && drop_index < packs_to_drop->nr; i++) {
int cmp = strcmp(ctx.info[i].pack_name,
packs_to_drop->items[drop_index].string);
if (!cmp) {
drop_index++;
packs.info[i].expired = 1;
ctx.info[i].expired = 1;
} else if (cmp > 0) {
error(_("did not see pack-file %s to drop"),
packs_to_drop->items[drop_index].string);
@ -882,7 +858,7 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
missing_drops++;
i--;
} else {
packs.info[i].expired = 0;
ctx.info[i].expired = 0;
}
}
@ -898,19 +874,19 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
*
* pack_perm[old_id] = new_id
*/
ALLOC_ARRAY(pack_perm, packs.nr);
for (i = 0; i < packs.nr; i++) {
if (packs.info[i].expired) {
ALLOC_ARRAY(ctx.pack_perm, ctx.nr);
for (i = 0; i < ctx.nr; i++) {
if (ctx.info[i].expired) {
dropped_packs++;
pack_perm[packs.info[i].orig_pack_int_id] = PACK_EXPIRED;
ctx.pack_perm[ctx.info[i].orig_pack_int_id] = PACK_EXPIRED;
} else {
pack_perm[packs.info[i].orig_pack_int_id] = i - dropped_packs;
ctx.pack_perm[ctx.info[i].orig_pack_int_id] = i - dropped_packs;
}
}
for (i = 0; i < packs.nr; i++) {
if (!packs.info[i].expired)
pack_name_concat_len += strlen(packs.info[i].pack_name) + 1;
for (i = 0; i < ctx.nr; i++) {
if (!ctx.info[i].expired)
pack_name_concat_len += strlen(ctx.info[i].pack_name) + 1;
}
if (pack_name_concat_len % MIDX_CHUNK_ALIGNMENT)
@ -921,123 +897,52 @@ static int write_midx_internal(const char *object_dir, struct multi_pack_index *
f = hashfd(get_lock_file_fd(&lk), get_lock_file_path(&lk));
FREE_AND_NULL(midx_name);
if (packs.m)
close_midx(packs.m);
if (ctx.m)
close_midx(ctx.m);
cur_chunk = 0;
num_chunks = large_offsets_needed ? 5 : 4;
if (packs.nr - dropped_packs == 0) {
if (ctx.nr - dropped_packs == 0) {
error(_("no pack files to index."));
result = 1;
goto cleanup;
}
written = write_midx_header(f, num_chunks, packs.nr - dropped_packs);
cf = init_chunkfile(f);
chunk_ids[cur_chunk] = MIDX_CHUNKID_PACKNAMES;
chunk_offsets[cur_chunk] = written + (num_chunks + 1) * MIDX_CHUNKLOOKUP_WIDTH;
add_chunk(cf, MIDX_CHUNKID_PACKNAMES, pack_name_concat_len,
write_midx_pack_names);
add_chunk(cf, MIDX_CHUNKID_OIDFANOUT, MIDX_CHUNK_FANOUT_SIZE,
write_midx_oid_fanout);
add_chunk(cf, MIDX_CHUNKID_OIDLOOKUP,
(size_t)ctx.entries_nr * the_hash_algo->rawsz,
write_midx_oid_lookup);
add_chunk(cf, MIDX_CHUNKID_OBJECTOFFSETS,
(size_t)ctx.entries_nr * MIDX_CHUNK_OFFSET_WIDTH,
write_midx_object_offsets);
cur_chunk++;
chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDFANOUT;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + pack_name_concat_len;
if (ctx.large_offsets_needed)
add_chunk(cf, MIDX_CHUNKID_LARGEOFFSETS,
(size_t)ctx.num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH,
write_midx_large_offsets);
cur_chunk++;
chunk_ids[cur_chunk] = MIDX_CHUNKID_OIDLOOKUP;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + MIDX_CHUNK_FANOUT_SIZE;
cur_chunk++;
chunk_ids[cur_chunk] = MIDX_CHUNKID_OBJECTOFFSETS;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * the_hash_algo->rawsz;
cur_chunk++;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] + nr_entries * MIDX_CHUNK_OFFSET_WIDTH;
if (large_offsets_needed) {
chunk_ids[cur_chunk] = MIDX_CHUNKID_LARGEOFFSETS;
cur_chunk++;
chunk_offsets[cur_chunk] = chunk_offsets[cur_chunk - 1] +
num_large_offsets * MIDX_CHUNK_LARGE_OFFSET_WIDTH;
}
chunk_ids[cur_chunk] = 0;
for (i = 0; i <= num_chunks; i++) {
if (i && chunk_offsets[i] < chunk_offsets[i - 1])
BUG("incorrect chunk offsets: %"PRIu64" before %"PRIu64,
chunk_offsets[i - 1],
chunk_offsets[i]);
if (chunk_offsets[i] % MIDX_CHUNK_ALIGNMENT)
BUG("chunk offset %"PRIu64" is not properly aligned",
chunk_offsets[i]);
hashwrite_be32(f, chunk_ids[i]);
hashwrite_be64(f, chunk_offsets[i]);
written += MIDX_CHUNKLOOKUP_WIDTH;
}
if (flags & MIDX_PROGRESS)
progress = start_delayed_progress(_("Writing chunks to multi-pack-index"),
num_chunks);
for (i = 0; i < num_chunks; i++) {
if (written != chunk_offsets[i])
BUG("incorrect chunk offset (%"PRIu64" != %"PRIu64") for chunk id %"PRIx32,
chunk_offsets[i],
written,
chunk_ids[i]);
switch (chunk_ids[i]) {
case MIDX_CHUNKID_PACKNAMES:
written += write_midx_pack_names(f, packs.info, packs.nr);
break;
case MIDX_CHUNKID_OIDFANOUT:
written += write_midx_oid_fanout(f, entries, nr_entries);
break;
case MIDX_CHUNKID_OIDLOOKUP:
written += write_midx_oid_lookup(f, the_hash_algo->rawsz, entries, nr_entries);
break;
case MIDX_CHUNKID_OBJECTOFFSETS:
written += write_midx_object_offsets(f, large_offsets_needed, pack_perm, entries, nr_entries);
break;
case MIDX_CHUNKID_LARGEOFFSETS:
written += write_midx_large_offsets(f, num_large_offsets, entries, nr_entries);
break;
default:
BUG("trying to write unknown chunk id %"PRIx32,
chunk_ids[i]);
}
display_progress(progress, i + 1);
}
stop_progress(&progress);
if (written != chunk_offsets[num_chunks])
BUG("incorrect final offset %"PRIu64" != %"PRIu64,
written,
chunk_offsets[num_chunks]);
write_midx_header(f, get_num_chunks(cf), ctx.nr - dropped_packs);
write_chunkfile(cf, &ctx);
finalize_hashfile(f, NULL, CSUM_FSYNC | CSUM_HASH_IN_STREAM);
free_chunkfile(cf);
commit_lock_file(&lk);
cleanup:
for (i = 0; i < packs.nr; i++) {
if (packs.info[i].p) {
close_pack(packs.info[i].p);
free(packs.info[i].p);
for (i = 0; i < ctx.nr; i++) {
if (ctx.info[i].p) {
close_pack(ctx.info[i].p);
free(ctx.info[i].p);
}
free(packs.info[i].pack_name);
free(ctx.info[i].pack_name);
}
free(packs.info);
free(entries);
free(pack_perm);
free(ctx.info);
free(ctx.entries);
free(ctx.pack_perm);
free(midx_name);
return result;
}

View File

@ -585,7 +585,7 @@ test_expect_success 'detect bad hash version' '
test_expect_success 'detect low chunk count' '
corrupt_graph_and_verify $GRAPH_BYTE_CHUNK_COUNT "\01" \
"missing the .* chunk"
"final chunk has non-zero id"
'
test_expect_success 'detect missing OID fanout chunk' '

View File

@ -314,12 +314,12 @@ test_expect_success 'verify bad OID version' '
test_expect_success 'verify truncated chunk count' '
corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\01" $objdir \
"missing required"
"final chunk has non-zero id"
'
test_expect_success 'verify extended chunk count' '
corrupt_midx_and_verify $MIDX_BYTE_CHUNK_COUNT "\07" $objdir \
"terminating multi-pack-index chunk id appears earlier than expected"
"terminating chunk id appears earlier than expected"
'
test_expect_success 'verify missing required chunk' '
@ -329,7 +329,7 @@ test_expect_success 'verify missing required chunk' '
test_expect_success 'verify invalid chunk offset' '
corrupt_midx_and_verify $MIDX_BYTE_CHUNK_OFFSET "\01" $objdir \
"invalid chunk offset (too large)"
"improper chunk offset(s)"
'
test_expect_success 'verify packnames out of order' '