git-fast-import(1) ================== NAME ---- git-fast-import - Backend for fast Git data importers. SYNOPSIS -------- frontend | 'git-fast-import' [options] DESCRIPTION ----------- This program is usually not what the end user wants to run directly. Most end users want to use one of the existing frontend programs, which parses a specific type of foreign source and feeds the contents stored there to git-fast-import (gfi). gfi reads a mixed command/data stream from standard input and writes one or more packfiles directly into the current repository. When EOF is received on standard input, fast import writes out updated branch and tag refs, fully updating the current repository with the newly imported data. The gfi backend itself can import into an empty repository (one that has already been initialized by gitlink:git-init[1]) or incrementally update an existing populated repository. Whether or not incremental imports are supported from a particular foreign source depends on the frontend program in use. OPTIONS ------- --max-pack-size=:: Maximum size of each output packfile, expressed in MiB. The default is 4096 (4 GiB) as that is the maximum allowed packfile size (due to file format limitations). Some importers may wish to lower this, such as to ensure the resulting packfiles fit on CDs. --depth=:: Maximum delta depth, for blob and tree deltification. Default is 10. --active-branches=:: Maximum number of branches to maintain active at once. See ``Memory Utilization'' below for details. Default is 5. --export-marks=:: Dumps the internal marks table to when complete. Marks are written one per line as `:markid SHA-1`. Frontends can use this file to validate imports after they have been completed. --branch-log=:: Records every tag and commit made to a log file. (This file can be quite verbose on large imports.) This particular option has been primarily intended to facilitate debugging gfi and has limited usefulness in other contexts. It may be removed in future versions. Performance ----------- The design of gfi allows it to import large projects in a minimum amount of memory usage and processing time. Assuming the frontend is able to keep up with gfi and feed it a constant stream of data, import times for projects holding 10+ years of history and containing 100,000+ individual commits are generally completed in just 1-2 hours on quite modest (~$2,000 USD) hardware. Most bottlenecks appear to be in foreign source data access (the source just cannot extract revisions fast enough) or disk IO (gfi writes as fast as the disk will take the data). Imports will run faster if the source data is stored on a different drive than the destination Git repository (due to less IO contention). Development Cost ---------------- A typical frontend for gfi tends to weigh in at approximately 200 lines of Perl/Python/Ruby code. Most developers have been able to create working importers in just a couple of hours, even though it is their first exposure to gfi, and sometimes even to Git. This is an ideal situation, given that most conversion tools are throw-away (use once, and never look back). Parallel Operation ------------------ Like `git-push` or `git-fetch`, imports handled by gfi are safe to run alongside parallel `git repack -a -d` or `git gc` invocations, or any other Git operation (including `git prune`, as loose objects are never used by gfi). However, gfi does not lock the branch or tag refs it is actively importing. After EOF, during its ref update phase, gfi blindly overwrites each imported branch or tag ref. Consequently it is not safe to modify refs that are currently being used by a running gfi instance, as work could be lost when gfi overwrites the refs. Technical Discussion -------------------- gfi tracks a set of branches in memory. Any branch can be created or modified at any point during the import process by sending a `commit` command on the input stream. This design allows a frontend program to process an unlimited number of branches simultaneously, generating commits in the order they are available from the source data. It also simplifies the frontend programs considerably. gfi does not use or alter the current working directory, or any file within it. (It does however update the current Git repository, as referenced by `GIT_DIR`.) Therefore an import frontend may use the working directory for its own purposes, such as extracting file revisions from the foreign source. This ignorance of the working directory also allows gfi to run very quickly, as it does not need to perform any costly file update operations when switching between branches. Input Format ------------ With the exception of raw file data (which Git does not interpret) the gfi input format is text (ASCII) based. This text based format simplifies development and debugging of frontend programs, especially when a higher level language such as Perl, Python or Ruby is being used. gfi is very strict about its input. Where we say SP below we mean *exactly* one space. Likewise LF means one (and only one) linefeed. Supplying additional whitespace characters will cause unexpected results, such as branch names or file names with leading or trailing spaces in their name, or early termination of gfi when it encounters unexpected input. Commands ~~~~~~~~ gfi accepts several commands to update the current repository and control the current import process. More detailed discussion (with examples) of each command follows later. `commit`:: Creates a new branch or updates an existing branch by creating a new commit and updating the branch to point at the newly created commit. `tag`:: Creates an annotated tag object from an existing commit or branch. Lightweight tags are not supported by this command, as they are not recommended for recording meaningful points in time. `reset`:: Reset an existing branch (or a new branch) to a specific revision. This command must be used to change a branch to a specific revision without making a commit on it. `blob`:: Convert raw file data into a blob, for future use in a `commit` command. This command is optional and is not needed to perform an import. `checkpoint`:: Forces gfi to close the current packfile, generate its unique SHA-1 checksum and index, and start a new packfile. This command is optional and is not needed to perform an import. `commit` ~~~~~~~~ Create or update a branch with a new commit, recording one logical change to the project. .... 'commit' SP LF mark? ('author' SP SP LT GT SP