mirror of
https://github.com/git/git.git
synced 2024-09-22 14:51:10 +02:00
Support RFC 2822 date parsing in fast-import.
Since some frontends may be working with source material where the dates are only readily available as RFC 2822 strings, it is more friendly if fast-import exposes Git's parse_date() function to handle the conversion. This way the frontend doesn't need to perform the parsing itself. The new --date-format option to fast-import can be used by a frontend to select which format it will supply date strings in. The default is the standard `raw` Git format, which fast-import has always supported. Format rfc2822 can be used to activate the parse_date() function instead. Because fast-import could also be useful for creating new, current commits, the format `now` is also supported to generate the current system timestamp. The implementation of `now` is a trivial call to datestamp(), but is actually a whole whopping 3 lines so that fast-import can verify the frontend really meant `now`. As part of this change I have added validation of the `raw` date format. Prior to this change fast-import would accept anything in a `committer` command, even if it was seriously malformed. Now fast-import requires the '> ' near the end of the string and verifies the timestamp is formatted properly. Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This commit is contained in:
parent
ef94edb53c
commit
63e0c8b364
@ -32,6 +32,12 @@ the frontend program in use.
|
|||||||
|
|
||||||
OPTIONS
|
OPTIONS
|
||||||
-------
|
-------
|
||||||
|
--date-format=<fmt>::
|
||||||
|
Specify the type of dates the frontend will supply to
|
||||||
|
gfi within `author`, `committer` and `tagger` commands.
|
||||||
|
See ``Date Formats'' below for details about which formats
|
||||||
|
are supported, and their syntax.
|
||||||
|
|
||||||
--max-pack-size=<n>::
|
--max-pack-size=<n>::
|
||||||
Maximum size of each output packfile, expressed in MiB.
|
Maximum size of each output packfile, expressed in MiB.
|
||||||
The default is 4096 (4 GiB) as that is the maximum allowed
|
The default is 4096 (4 GiB) as that is the maximum allowed
|
||||||
@ -53,7 +59,6 @@ OPTIONS
|
|||||||
Frontends can use this file to validate imports after they
|
Frontends can use this file to validate imports after they
|
||||||
have been completed.
|
have been completed.
|
||||||
|
|
||||||
|
|
||||||
Performance
|
Performance
|
||||||
-----------
|
-----------
|
||||||
The design of gfi allows it to import large projects in a minimum
|
The design of gfi allows it to import large projects in a minimum
|
||||||
@ -127,6 +132,78 @@ results, such as branch names or file names with leading or trailing
|
|||||||
spaces in their name, or early termination of gfi when it encounters
|
spaces in their name, or early termination of gfi when it encounters
|
||||||
unexpected input.
|
unexpected input.
|
||||||
|
|
||||||
|
Date Formats
|
||||||
|
~~~~~~~~~~~~
|
||||||
|
The following date formats are supported. A frontend should select
|
||||||
|
the format it will use for this import by passing the format name
|
||||||
|
in the `--date-format=<fmt>` command line option.
|
||||||
|
|
||||||
|
`raw`::
|
||||||
|
This is the Git native format and is `<time> SP <tz>`.
|
||||||
|
It is also gfi's default format, if `--date-format` was
|
||||||
|
not specified.
|
||||||
|
+
|
||||||
|
The time of the event is specified by `<time>` as the number of
|
||||||
|
seconds since the UNIX epoch (midnight, Jan 1, 1970, UTC) and is
|
||||||
|
written as an ASCII decimal integer.
|
||||||
|
+
|
||||||
|
The timezone is specified by `<tz>` as a positive or negative offset
|
||||||
|
from UTC. For example EST (which is typically 5 hours behind GMT)
|
||||||
|
would be expressed in `<tz>` by ``-0500'' while GMT is ``+0000''.
|
||||||
|
+
|
||||||
|
If the timezone is not available in the source material, use
|
||||||
|
``+0000'', or the most common local timezone. For example many
|
||||||
|
organizations have a CVS repository which has only ever been accessed
|
||||||
|
by users who are located in the same location and timezone. In this
|
||||||
|
case the user's timezone can be easily assumed.
|
||||||
|
+
|
||||||
|
Unlike the `rfc2822` format, this format is very strict. Any
|
||||||
|
variation in formatting will cause gfi to reject the value.
|
||||||
|
|
||||||
|
`rfc2822`::
|
||||||
|
This is the standard email format as described by RFC 2822.
|
||||||
|
+
|
||||||
|
An example value is ``Tue Feb 6 11:22:18 2007 -0500''. The Git
|
||||||
|
parser is accurate, but a little on the lenient side. Its the
|
||||||
|
same parser used by gitlink:git-am[1] when applying patches
|
||||||
|
received from email.
|
||||||
|
+
|
||||||
|
Some malformed strings may be accepted as valid dates. In some of
|
||||||
|
these cases Git will still be able to obtain the correct date from
|
||||||
|
the malformed string. There are also some types of malformed
|
||||||
|
strings which Git will parse wrong, and yet consider valid.
|
||||||
|
Seriously malformed strings will be rejected.
|
||||||
|
+
|
||||||
|
If the source material is formatted in RFC 2822 style dates,
|
||||||
|
the frontend should let gfi handle the parsing and conversion
|
||||||
|
(rather than attempting to do it itself) as the Git parser has
|
||||||
|
been well tested in the wild.
|
||||||
|
+
|
||||||
|
Frontends should prefer the `raw` format if the source material
|
||||||
|
is already in UNIX-epoch format, or is easily convertible to
|
||||||
|
that format, as there is no ambiguity in parsing.
|
||||||
|
|
||||||
|
`now`::
|
||||||
|
Always use the current time and timezone. The literal
|
||||||
|
`now` must always be supplied for `<when>`.
|
||||||
|
+
|
||||||
|
This is a toy format. The current time and timezone of this system
|
||||||
|
is always copied into the identity string at the time it is being
|
||||||
|
created by gfi. There is no way to specify a different time or
|
||||||
|
timezone.
|
||||||
|
+
|
||||||
|
This particular format is supplied as its short to implement and
|
||||||
|
may be useful to a process that wants to create a new commit
|
||||||
|
right now, without needing to use a working directory or
|
||||||
|
gitlink:git-update-index[1].
|
||||||
|
+
|
||||||
|
If separate `author` and `committer` commands are used in a `commit`
|
||||||
|
the timestamps may not match, as the system clock will be polled
|
||||||
|
twice (once for each command). The only way to ensure that both
|
||||||
|
author and committer identity information has the same timestamp
|
||||||
|
is to omit `author` (thus copying from `committer`) or to use a
|
||||||
|
date format other than `now`.
|
||||||
|
|
||||||
Commands
|
Commands
|
||||||
~~~~~~~~
|
~~~~~~~~
|
||||||
gfi accepts several commands to update the current repository
|
gfi accepts several commands to update the current repository
|
||||||
@ -168,8 +245,8 @@ change to the project.
|
|||||||
....
|
....
|
||||||
'commit' SP <ref> LF
|
'commit' SP <ref> LF
|
||||||
mark?
|
mark?
|
||||||
('author' SP <name> SP LT <email> GT SP <time> SP <tz> LF)?
|
('author' SP <name> SP LT <email> GT SP <when> LF)?
|
||||||
'committer' SP <name> SP LT <email> GT SP <time> SP <tz> LF
|
'committer' SP <name> SP LT <email> GT SP <when> LF
|
||||||
data
|
data
|
||||||
('from' SP <committish> LF)?
|
('from' SP <committish> LF)?
|
||||||
('merge' SP <committish> LF)?
|
('merge' SP <committish> LF)?
|
||||||
@ -222,12 +299,10 @@ the email address from the other fields in the line. Note that
|
|||||||
`<name>` is free-form and may contain any sequence of bytes, except
|
`<name>` is free-form and may contain any sequence of bytes, except
|
||||||
`LT` and `LF`. It is typically UTF-8 encoded.
|
`LT` and `LF`. It is typically UTF-8 encoded.
|
||||||
|
|
||||||
The time of the change is specified by `<time>` as the number of
|
The time of the change is specified by `<when>` using the date format
|
||||||
seconds since the UNIX epoc (midnight, Jan 1, 1970, UTC) and is
|
that was selected by the `--date-format=<fmt>` command line option.
|
||||||
written as an ASCII decimal integer. The committer's
|
See ``Date Formats'' above for the set of supported formats, and
|
||||||
timezone is specified by `<tz>` as a positive or negative offset
|
their syntax.
|
||||||
from UTC. For example EST (which is typically 5 hours behind GMT)
|
|
||||||
would be expressed in `<tz>` by ``-0500'' while GMT is ``+0000''.
|
|
||||||
|
|
||||||
`from`
|
`from`
|
||||||
^^^^^^
|
^^^^^^
|
||||||
@ -394,7 +469,7 @@ lightweight (non-annotated) tags see the `reset` command below.
|
|||||||
....
|
....
|
||||||
'tag' SP <name> LF
|
'tag' SP <name> LF
|
||||||
'from' SP <committish> LF
|
'from' SP <committish> LF
|
||||||
'tagger' SP <name> SP LT <email> GT SP <time> SP <tz> LF
|
'tagger' SP <name> SP LT <email> GT SP <when> LF
|
||||||
data
|
data
|
||||||
LF
|
LF
|
||||||
....
|
....
|
||||||
|
107
fast-import.c
107
fast-import.c
@ -17,8 +17,8 @@ Format of STDIN stream:
|
|||||||
|
|
||||||
new_commit ::= 'commit' sp ref_str lf
|
new_commit ::= 'commit' sp ref_str lf
|
||||||
mark?
|
mark?
|
||||||
('author' sp name '<' email '>' ts tz lf)?
|
('author' sp name '<' email '>' when lf)?
|
||||||
'committer' sp name '<' email '>' ts tz lf
|
'committer' sp name '<' email '>' when lf
|
||||||
commit_msg
|
commit_msg
|
||||||
('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)?
|
('from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)?
|
||||||
('merge' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)*
|
('merge' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf)*
|
||||||
@ -34,7 +34,7 @@ Format of STDIN stream:
|
|||||||
|
|
||||||
new_tag ::= 'tag' sp tag_str lf
|
new_tag ::= 'tag' sp tag_str lf
|
||||||
'from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf
|
'from' sp (ref_str | hexsha1 | sha1exp_str | idnum) lf
|
||||||
'tagger' sp name '<' email '>' ts tz lf
|
'tagger' sp name '<' email '>' when lf
|
||||||
tag_msg;
|
tag_msg;
|
||||||
tag_msg ::= data;
|
tag_msg ::= data;
|
||||||
|
|
||||||
@ -88,6 +88,10 @@ Format of STDIN stream:
|
|||||||
bigint ::= # unsigned integer value, ascii base10 notation;
|
bigint ::= # unsigned integer value, ascii base10 notation;
|
||||||
binary_data ::= # file content, not interpreted;
|
binary_data ::= # file content, not interpreted;
|
||||||
|
|
||||||
|
when ::= raw_when | rfc2822_when;
|
||||||
|
raw_when ::= ts sp tz;
|
||||||
|
rfc2822_when ::= # Valid RFC 2822 date and time;
|
||||||
|
|
||||||
sp ::= # ASCII space character;
|
sp ::= # ASCII space character;
|
||||||
lf ::= # ASCII newline (LF) character;
|
lf ::= # ASCII newline (LF) character;
|
||||||
|
|
||||||
@ -234,6 +238,12 @@ struct hash_list
|
|||||||
unsigned char sha1[20];
|
unsigned char sha1[20];
|
||||||
};
|
};
|
||||||
|
|
||||||
|
typedef enum {
|
||||||
|
WHENSPEC_RAW = 1,
|
||||||
|
WHENSPEC_RFC2822,
|
||||||
|
WHENSPEC_NOW,
|
||||||
|
} whenspec_type;
|
||||||
|
|
||||||
/* Configured limits on output */
|
/* Configured limits on output */
|
||||||
static unsigned long max_depth = 10;
|
static unsigned long max_depth = 10;
|
||||||
static unsigned long max_packsize = (1LL << 32) - 1;
|
static unsigned long max_packsize = (1LL << 32) - 1;
|
||||||
@ -294,6 +304,7 @@ static struct tag *first_tag;
|
|||||||
static struct tag *last_tag;
|
static struct tag *last_tag;
|
||||||
|
|
||||||
/* Input stream parsing */
|
/* Input stream parsing */
|
||||||
|
static whenspec_type whenspec = WHENSPEC_RAW;
|
||||||
static struct strbuf command_buf;
|
static struct strbuf command_buf;
|
||||||
static uintmax_t next_mark;
|
static uintmax_t next_mark;
|
||||||
static struct dbuf new_data;
|
static struct dbuf new_data;
|
||||||
@ -1396,6 +1407,64 @@ static void *cmd_data (size_t *size)
|
|||||||
return buffer;
|
return buffer;
|
||||||
}
|
}
|
||||||
|
|
||||||
|
static int validate_raw_date(const char *src, char *result, int maxlen)
|
||||||
|
{
|
||||||
|
const char *orig_src = src;
|
||||||
|
char *endp, sign;
|
||||||
|
|
||||||
|
strtoul(src, &endp, 10);
|
||||||
|
if (endp == src || *endp != ' ')
|
||||||
|
return -1;
|
||||||
|
|
||||||
|
src = endp + 1;
|
||||||
|
if (*src != '-' && *src != '+')
|
||||||
|
return -1;
|
||||||
|
sign = *src;
|
||||||
|
|
||||||
|
strtoul(src + 1, &endp, 10);
|
||||||
|
if (endp == src || *endp || (endp - orig_src) >= maxlen)
|
||||||
|
return -1;
|
||||||
|
|
||||||
|
strcpy(result, orig_src);
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
|
||||||
|
static char *parse_ident(const char *buf)
|
||||||
|
{
|
||||||
|
const char *gt;
|
||||||
|
size_t name_len;
|
||||||
|
char *ident;
|
||||||
|
|
||||||
|
gt = strrchr(buf, '>');
|
||||||
|
if (!gt)
|
||||||
|
die("Missing > in ident string: %s", buf);
|
||||||
|
gt++;
|
||||||
|
if (*gt != ' ')
|
||||||
|
die("Missing space after > in ident string: %s", buf);
|
||||||
|
gt++;
|
||||||
|
name_len = gt - buf;
|
||||||
|
ident = xmalloc(name_len + 24);
|
||||||
|
strncpy(ident, buf, name_len);
|
||||||
|
|
||||||
|
switch (whenspec) {
|
||||||
|
case WHENSPEC_RAW:
|
||||||
|
if (validate_raw_date(gt, ident + name_len, 24) < 0)
|
||||||
|
die("Invalid raw date \"%s\" in ident: %s", gt, buf);
|
||||||
|
break;
|
||||||
|
case WHENSPEC_RFC2822:
|
||||||
|
if (parse_date(gt, ident + name_len, 24) < 0)
|
||||||
|
die("Invalid rfc2822 date \"%s\" in ident: %s", gt, buf);
|
||||||
|
break;
|
||||||
|
case WHENSPEC_NOW:
|
||||||
|
if (strcmp("now", gt))
|
||||||
|
die("Date in ident must be 'now': %s", buf);
|
||||||
|
datestamp(ident + name_len, 24);
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
|
||||||
|
return ident;
|
||||||
|
}
|
||||||
|
|
||||||
static void cmd_new_blob(void)
|
static void cmd_new_blob(void)
|
||||||
{
|
{
|
||||||
size_t l;
|
size_t l;
|
||||||
@ -1655,11 +1724,11 @@ static void cmd_new_commit(void)
|
|||||||
read_next_command();
|
read_next_command();
|
||||||
cmd_mark();
|
cmd_mark();
|
||||||
if (!strncmp("author ", command_buf.buf, 7)) {
|
if (!strncmp("author ", command_buf.buf, 7)) {
|
||||||
author = strdup(command_buf.buf);
|
author = parse_ident(command_buf.buf + 7);
|
||||||
read_next_command();
|
read_next_command();
|
||||||
}
|
}
|
||||||
if (!strncmp("committer ", command_buf.buf, 10)) {
|
if (!strncmp("committer ", command_buf.buf, 10)) {
|
||||||
committer = strdup(command_buf.buf);
|
committer = parse_ident(command_buf.buf + 10);
|
||||||
read_next_command();
|
read_next_command();
|
||||||
}
|
}
|
||||||
if (!committer)
|
if (!committer)
|
||||||
@ -1692,7 +1761,7 @@ static void cmd_new_commit(void)
|
|||||||
store_tree(&b->branch_tree);
|
store_tree(&b->branch_tree);
|
||||||
hashcpy(b->branch_tree.versions[0].sha1,
|
hashcpy(b->branch_tree.versions[0].sha1,
|
||||||
b->branch_tree.versions[1].sha1);
|
b->branch_tree.versions[1].sha1);
|
||||||
size_dbuf(&new_data, 97 + msglen
|
size_dbuf(&new_data, 114 + msglen
|
||||||
+ merge_count * 49
|
+ merge_count * 49
|
||||||
+ (author
|
+ (author
|
||||||
? strlen(author) + strlen(committer)
|
? strlen(author) + strlen(committer)
|
||||||
@ -1708,11 +1777,9 @@ static void cmd_new_commit(void)
|
|||||||
free(merge_list);
|
free(merge_list);
|
||||||
merge_list = next;
|
merge_list = next;
|
||||||
}
|
}
|
||||||
if (author)
|
sp += sprintf(sp, "author %s\n", author ? author : committer);
|
||||||
sp += sprintf(sp, "%s\n", author);
|
sp += sprintf(sp, "committer %s\n", committer);
|
||||||
else
|
*sp++ = '\n';
|
||||||
sp += sprintf(sp, "author %s\n", committer + 10);
|
|
||||||
sp += sprintf(sp, "%s\n\n", committer);
|
|
||||||
memcpy(sp, msg, msglen);
|
memcpy(sp, msg, msglen);
|
||||||
sp += msglen;
|
sp += msglen;
|
||||||
free(author);
|
free(author);
|
||||||
@ -1780,7 +1847,7 @@ static void cmd_new_tag(void)
|
|||||||
/* tagger ... */
|
/* tagger ... */
|
||||||
if (strncmp("tagger ", command_buf.buf, 7))
|
if (strncmp("tagger ", command_buf.buf, 7))
|
||||||
die("Expected tagger command, got %s", command_buf.buf);
|
die("Expected tagger command, got %s", command_buf.buf);
|
||||||
tagger = strdup(command_buf.buf);
|
tagger = parse_ident(command_buf.buf + 7);
|
||||||
|
|
||||||
/* tag payload/message */
|
/* tag payload/message */
|
||||||
read_next_command();
|
read_next_command();
|
||||||
@ -1792,7 +1859,8 @@ static void cmd_new_tag(void)
|
|||||||
sp += sprintf(sp, "object %s\n", sha1_to_hex(sha1));
|
sp += sprintf(sp, "object %s\n", sha1_to_hex(sha1));
|
||||||
sp += sprintf(sp, "type %s\n", type_names[OBJ_COMMIT]);
|
sp += sprintf(sp, "type %s\n", type_names[OBJ_COMMIT]);
|
||||||
sp += sprintf(sp, "tag %s\n", t->name);
|
sp += sprintf(sp, "tag %s\n", t->name);
|
||||||
sp += sprintf(sp, "%s\n\n", tagger);
|
sp += sprintf(sp, "tagger %s\n", tagger);
|
||||||
|
*sp++ = '\n';
|
||||||
memcpy(sp, msg, msglen);
|
memcpy(sp, msg, msglen);
|
||||||
sp += msglen;
|
sp += msglen;
|
||||||
free(tagger);
|
free(tagger);
|
||||||
@ -1835,7 +1903,7 @@ static void cmd_checkpoint(void)
|
|||||||
}
|
}
|
||||||
|
|
||||||
static const char fast_import_usage[] =
|
static const char fast_import_usage[] =
|
||||||
"git-fast-import [--depth=n] [--active-branches=n] [--export-marks=marks.file] [--branch-log=log]";
|
"git-fast-import [--date-format=f] [--max-pack-size=n] [--depth=n] [--active-branches=n] [--export-marks=marks.file]";
|
||||||
|
|
||||||
int main(int argc, const char **argv)
|
int main(int argc, const char **argv)
|
||||||
{
|
{
|
||||||
@ -1849,6 +1917,17 @@ int main(int argc, const char **argv)
|
|||||||
|
|
||||||
if (*a != '-' || !strcmp(a, "--"))
|
if (*a != '-' || !strcmp(a, "--"))
|
||||||
break;
|
break;
|
||||||
|
else if (!strncmp(a, "--date-format=", 14)) {
|
||||||
|
const char *fmt = a + 14;
|
||||||
|
if (!strcmp(fmt, "raw"))
|
||||||
|
whenspec = WHENSPEC_RAW;
|
||||||
|
else if (!strcmp(fmt, "rfc2822"))
|
||||||
|
whenspec = WHENSPEC_RFC2822;
|
||||||
|
else if (!strcmp(fmt, "now"))
|
||||||
|
whenspec = WHENSPEC_NOW;
|
||||||
|
else
|
||||||
|
die("unknown --date-format argument %s", fmt);
|
||||||
|
}
|
||||||
else if (!strncmp(a, "--max-pack-size=", 16))
|
else if (!strncmp(a, "--max-pack-size=", 16))
|
||||||
max_packsize = strtoumax(a + 16, NULL, 0) * 1024 * 1024;
|
max_packsize = strtoumax(a + 16, NULL, 0) * 1024 * 1024;
|
||||||
else if (!strncmp(a, "--depth=", 8))
|
else if (!strncmp(a, "--depth=", 8))
|
||||||
|
@ -240,4 +240,40 @@ test_expect_success \
|
|||||||
'git-cat-file blob branch:newdir/exec.sh >actual &&
|
'git-cat-file blob branch:newdir/exec.sh >actual &&
|
||||||
diff -u expect actual'
|
diff -u expect actual'
|
||||||
|
|
||||||
|
###
|
||||||
|
### series E
|
||||||
|
###
|
||||||
|
|
||||||
|
cat >input <<INPUT_END
|
||||||
|
commit refs/heads/branch
|
||||||
|
author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> Tue Feb 6 11:22:18 2007 -0500
|
||||||
|
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> Tue Feb 6 12:35:02 2007 -0500
|
||||||
|
data <<COMMIT
|
||||||
|
RFC 2822 type date
|
||||||
|
COMMIT
|
||||||
|
|
||||||
|
from refs/heads/branch^0
|
||||||
|
|
||||||
|
INPUT_END
|
||||||
|
test_expect_failure \
|
||||||
|
'E: rfc2822 date, --date-format=raw' \
|
||||||
|
'git-fast-import --date-format=raw <input'
|
||||||
|
test_expect_success \
|
||||||
|
'E: rfc2822 date, --date-format=rfc2822' \
|
||||||
|
'git-fast-import --date-format=rfc2822 <input'
|
||||||
|
test_expect_success \
|
||||||
|
'E: verify pack' \
|
||||||
|
'for p in .git/objects/pack/*.pack;do git-verify-pack $p||exit;done'
|
||||||
|
|
||||||
|
cat >expect <<EOF
|
||||||
|
author $GIT_AUTHOR_NAME <$GIT_AUTHOR_EMAIL> 1170778938 -0500
|
||||||
|
committer $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> 1170783302 -0500
|
||||||
|
|
||||||
|
RFC 2822 type date
|
||||||
|
EOF
|
||||||
|
test_expect_success \
|
||||||
|
'E: verify commit' \
|
||||||
|
'git-cat-file commit branch | sed 1,2d >actual &&
|
||||||
|
diff -u expect actual'
|
||||||
|
|
||||||
test_done
|
test_done
|
||||||
|
Loading…
Reference in New Issue
Block a user