I am working on a project that required parsing through regular
mboxes that didn't necessarily have patches embedded in them. I
started by creating my own modified copy of git-am and working
from there. Very quickly, I noticed git-mailinfo wasn't able to
handle a big chunk of my email.
After hacking up numerous solutions and running into more
limitations, I decided it was just easier to rewrite a big chunk
of it. The following patch has a bunch of fixes and features
that I needed in order for me do what I wanted.
Note: I'm didn't follow any email rfc papers but I don't think
any of the changes I did required much knowledge (besides the
boundary stuff).
List of major changes/fixes:
- can't create empty patch files fix
- empty patch files don't fail, this failure will come inside git-am
- multipart boundaries are now handled
- only output inbody headers if a patch exists otherwise assume those
headers are part of the reply and instead output the original headers
- decode and filter base64 patches correctly
- various other accidental fixes
I believe I didn't break any existing functionality or
compatibility (other than what I describe above, which is really
only the empty patch file).
I tested this through various mailing list archives and
everything seemed to parse correctly (a couple thousand emails).
[jc: squashed in another patch from Don's five patch series to
fix the test case, as this patch exposes the bug in the test.]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
We shouldn't attempt to assign constant strings into char*, as the
string is not writable at runtime. Likewise we should always be
treating unsigned values as unsigned values, not as signed values.
Most of these are very straightforward. The only exception is the
(unnecessary) xstrdup/free in builtin-branch.c for the detached
head case. Since this is a user-level interactive type program
and that particular code path is executed no more than once, I feel
that the extra xstrdup call is well worth the easy elimination of
this warning.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
* maint:
git-apply: do not fix whitespaces on context lines.
diff --cc: integer overflow given a 2GB-or-larger file
mailinfo: do not get confused with logical lines that are too long.
It basically considers all the continuation lines to be lines of their
own, and if the total line is bigger than what we can fit in it, we just
truncate the result rather than stop in the middle and then get confused
when we try to parse the "next" line (which is just the remainder of the
first line).
[jc: added test, and tightened boundary a bit per list discussion.]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This mechanically converts strncmp() to use prefixcmp(), but only when
the parameters match specific patterns, so that they can be verified
easily. Leftover from this will be fixed in a separate step, including
idiotic conversions like
if (!strncmp("foo", arg, 3))
=>
if (!(-prefixcmp(arg, "foo")))
This was done by using this script in px.perl
#!/usr/bin/perl -i.bak -p
if (/strncmp\(([^,]+), "([^\\"]*)", (\d+)\)/ && (length($2) == $3)) {
s|strncmp\(([^,]+), "([^\\"]*)", (\d+)\)|prefixcmp($1, "$2")|;
}
if (/strncmp\("([^\\"]*)", ([^,]+), (\d+)\)/ && (length($1) == $3)) {
s|strncmp\("([^\\"]*)", ([^,]+), (\d+)\)|(-prefixcmp($2, "$1"))|;
}
and running:
$ git grep -l strncmp -- '*.c' | xargs perl px.perl
Signed-off-by: Junio C Hamano <junkio@cox.net>
It is plausible for somebody to want to view the commit log in a
different encoding from i18n.commitencoding -- the project's
policy may be UTF-8 and the user may be using a commit message
hook to run iconv to conform to that policy (and either not have
i18n.commitencoding to default to UTF-8 or have it explicitly
set to UTF-8). Even then, Latin-1 may be more convenient for
the usual pager and the terminal the user uses.
The new variable i18n.logoutputencoding is used in preference to
i18n.commitencoding to decide what encoding to recode the log
output in when git-log and friends formats the commit log message.
Signed-off-by: Junio C Hamano <junkio@cox.net>
This is a mechanical clean-up of the way *.c files include
system header files.
(1) sources under compat/, platform sha-1 implementations, and
xdelta code are exempt from the following rules;
(2) the first #include must be "git-compat-util.h" or one of
our own header file that includes it first (e.g. config.h,
builtin.h, pkt-line.h);
(3) system headers that are included in "git-compat-util.h"
need not be included in individual C source files.
(4) "git-compat-util.h" does not have to include subsystem
specific header files (e.g. expat.h).
Signed-off-by: Junio C Hamano <junkio@cox.net>
builtin-mailinfo.c has its own hexval implementaiton but it can
share the table-lookup one recently implemented in sha1_file.c
Signed-off-by: Junio C Hamano <junkio@cox.net>
[jc: I needed to hand merge the changes to the updated codebase,
so the result needs to be checked.]
Signed-off-by: David Rientjes <rientjes@google.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
This changes the calling convention of built-in commands and
passes the "prefix" (i.e. pathname of $PWD relative to the
project root level) down to them.
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
Mail I get sometimes has multiple From lines, like this:
From Majordomo@vger.kernel.org Thu Jul 27 16:39:36 2006
>From mtsirkin Thu Jul 27 16:39:36 2006
Received: from yok.mtl.com [10.0.8.11]
...
which confuses git-mailinfo since that does not recognize >From
as a valid header line.
This patch makes it recognize >From XXX as a valid header line.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Junio C Hamano <junkio@cox.net>
When the input mbox does not identify what encoding it is in,
and already have RFC2047 stripped away, we cannot tell what
encoding the header text is in. For body text, when the message
does not say what charset it is in, we fall back to assume
latin-1 input when converting to utf8. This should be done
consistently to the header as well.
Signed-off-by: Junio C Hamano <junkio@cox.net>