1
0
mirror of https://github.com/git/git.git synced 2024-11-18 19:13:58 +01:00

Documentation: describe pack idx v2

Lifted from the log message of c553ca25bd60dc9fd50b8bc7bd329601b81cee66
(pack-objects: learn about pack index version 2).

Acked-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
linux@horizon.com 2007-12-14 06:28:14 -05:00 committed by Junio C Hamano
parent 29ab27f4b5
commit 71362bd552

@ -1,9 +1,9 @@
GIT pack format
===============
= pack-*.pack file has the following format:
= pack-*.pack files have the following format:
- The header appears at the beginning and consists of the following:
- A header appears at the beginning and consists of the following:
4-byte signature:
The signature is: {'P', 'A', 'C', 'K'}
@ -34,18 +34,14 @@ GIT pack format
- The trailer records 20-byte SHA1 checksum of all of the above.
= pack-*.idx file has the following format:
= Original (version 1) pack-*.idx files have the following format:
- The header consists of 256 4-byte network byte order
integers. N-th entry of this table records the number of
objects in the corresponding pack, the first byte of whose
object name are smaller than N. This is called the
object name is less than or equal to N. This is called the
'first-level fan-out' table.
Observation: we would need to extend this to an array of
8-byte integers to go beyond 4G objects per pack, but it is
not strictly necessary.
- The header is followed by sorted 24-byte entries, one entry
per object in the pack. Each entry is:
@ -55,10 +51,6 @@ GIT pack format
20-byte object name.
Observation: we would definitely need to extend this to
8-byte integer plus 20-byte object name to handle a packfile
that is larger than 4GB.
- The file is concluded with a trailer:
A copy of the 20-byte SHA1 checksum at the end of
@ -68,31 +60,30 @@ GIT pack format
Pack Idx file:
idx
+--------------------------------+
| fanout[0] = 2 |-.
+--------------------------------+ |
-- +--------------------------------+
fanout | fanout[0] = 2 (for example) |-.
table +--------------------------------+ |
| fanout[1] | |
+--------------------------------+ |
| fanout[2] | |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| fanout[255] | |
+--------------------------------+ |
main | offset | |
index | object name 00XXXXXXXXXXXXXXXX | |
table +--------------------------------+ |
| offset | |
| object name 00XXXXXXXXXXXXXXXX | |
+--------------------------------+ |
.-| offset |<+
| | object name 01XXXXXXXXXXXXXXXX |
| +--------------------------------+
| | offset |
| | object name 01XXXXXXXXXXXXXXXX |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| | offset |
| | object name FFXXXXXXXXXXXXXXXX |
| +--------------------------------+
| fanout[255] = total objects |---.
-- +--------------------------------+ | |
main | offset | | |
index | object name 00XXXXXXXXXXXXXXXX | | |
table +--------------------------------+ | |
| offset | | |
| object name 00XXXXXXXXXXXXXXXX | | |
+--------------------------------+<+ |
.-| offset | |
| | object name 01XXXXXXXXXXXXXXXX | |
| +--------------------------------+ |
| | offset | |
| | object name 01XXXXXXXXXXXXXXXX | |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| | offset | |
| | object name FFXXXXXXXXXXXXXXXX | |
--| +--------------------------------+<--+
trailer | | packfile checksum |
| +--------------------------------+
| | idxfile checksum |
@ -116,3 +107,40 @@ Pack file entry: <+
20-byte base object name SHA1 (the size above is the
size of the delta data that follows).
delta data, deflated.
= Version 2 pack-*.idx files support packs larger than 4 GiB, and
have some other reorganizations. They have the format:
- A 4-byte magic number '\377tOc' which is an unreasonable
fanout[0] value.
- A 4-byte version number (= 2)
- A 256-entry fan-out table just like v1.
- A table of sorted 20-byte SHA1 object names. These are
packed together without offset values to reduce the cache
footprint of the binary search for a specific object name.
- A table of 4-byte CRC32 values of the packed object data.
This is new in v2 so compressed data can be copied directly
from pack to pack during repacking withough undetected
data corruption.
- A table of 4-byte offset values (in network byte order).
These are usually 31-bit pack file offsets, but large
offsets are encoded as an index into the next table with
the msbit set.
- A table of 8-byte offset entries (empty for pack files less
than 2 GiB). Pack files are organized with heavily used
objects toward the front, so most object references should
not need to refer to this table.
- The same trailer as a v1 pack file:
A copy of the 20-byte SHA1 checksum at the end of
corresponding packfile.
20-byte SHA1-checksum of all of the above.