1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-19 04:46:11 +02:00
Commit Graph

268 Commits

Author SHA1 Message Date
Johannes Schindelin 7c9fbda6e2 Sync with 2.18.2
* maint-2.18: (33 commits)
  Git 2.18.2
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  ...
2019-12-06 16:30:38 +01:00
Johannes Schindelin 14af7ed5a9 Sync with 2.17.3
* maint-2.17: (32 commits)
  Git 2.17.3
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  ...
2019-12-06 16:29:15 +01:00
Jonathan Nieder bb92255ebe fsck: reject submodule.update = !command in .gitmodules
This allows hosting providers to detect whether they are being used
to attack users using malicious 'update = !command' settings in
.gitmodules.

Since ac1fbbda20 (submodule: do not copy unknown update mode from
.gitmodules, 2013-12-02), in normal cases such settings have been
treated as 'update = none', so forbidding them should not produce any
collateral damage to legitimate uses.  A quick search does not reveal
any repositories making use of this construct, either.

Reported-by: Joern Schneeweisz <jschneeweisz@gitlab.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-06 16:27:38 +01:00
Johannes Schindelin bdfef0492c Sync with 2.16.6
* maint-2.16: (31 commits)
  Git 2.16.6
  test-drop-caches: use `has_dos_drive_prefix()`
  Git 2.15.4
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  path: safeguard `.git` against NTFS Alternate Streams Accesses
  ...
2019-12-06 16:27:36 +01:00
Johannes Schindelin d3ac8c3f27 Sync with 2.14.6
* maint-2.14: (28 commits)
  Git 2.14.6
  mingw: handle `subst`-ed "DOS drives"
  mingw: refuse to access paths with trailing spaces or periods
  mingw: refuse to access paths with illegal characters
  unpack-trees: let merged_entry() pass through do_add_entry()'s errors
  quote-stress-test: offer to test quoting arguments for MSYS2 sh
  t6130/t9350: prepare for stringent Win32 path validation
  quote-stress-test: allow skipping some trials
  quote-stress-test: accept arguments to test via the command-line
  tests: add a helper to stress test argument quoting
  mingw: fix quoting of arguments
  Disallow dubiously-nested submodule git directories
  protect_ntfs: turn on NTFS protection by default
  path: also guard `.gitmodules` against NTFS Alternate Data Streams
  is_ntfs_dotgit(): speed it up
  mingw: disallow backslash characters in tree objects' file names
  path: safeguard `.git` against NTFS Alternate Streams Accesses
  clone --recurse-submodules: prevent name squatting on Windows
  is_ntfs_dotgit(): only verify the leading segment
  test-path-utils: offer to run a protectNTFS/protectHFS benchmark
  ...
2019-12-06 16:26:55 +01:00
Johannes Schindelin 288a74bcd2 is_ntfs_dotgit(): only verify the leading segment
The config setting `core.protectNTFS` is specifically designed to work
not only on Windows, but anywhere, to allow for repositories hosted on,
say, Linux servers to be protected against NTFS-specific attack vectors.

As a consequence, `is_ntfs_dotgit()` manually splits backslash-separated
paths (but does not do the same for paths separated by forward slashes),
under the assumption that the backslash might not be a valid directory
separator on the _current_ Operating System.

However, the two callers, `verify_path()` and `fsck_tree()`, are
supposed to feed only individual path segments to the `is_ntfs_dotgit()`
function.

This causes a lot of duplicate scanning (and very inefficient scanning,
too, as the inner loop of `is_ntfs_dotgit()` was optimized for
readability rather than for speed.

Let's simplify the design of `is_ntfs_dotgit()` by putting the burden of
splitting the paths by backslashes as directory separators on the
callers of said function.

Consequently, the `verify_path()` function, which already splits the
path by directory separators, now treats backslashes as directory
separators _explicitly_ when `core.protectNTFS` is turned on, even on
platforms where the backslash is _not_ a directory separator.

Note that we have to repeat some code in `verify_path()`: if the
backslash is not a directory separator on the current Operating System,
we want to allow file names like `\`, but we _do_ want to disallow paths
that are clearly intended to cause harm when the repository is cloned on
Windows.

The `fsck_tree()` function (the other caller of `is_ntfs_dotgit()`) now
needs to look for backslashes in tree entries' names specifically when
`core.protectNTFS` is turned on. While it would be tempting to
completely disallow backslashes in that case (much like `fsck` reports
names containing forward slashes as "full paths"), this would be
overzealous: when `core.protectNTFS` is turned on in a non-Windows
setup, backslashes are perfectly valid characters in file names while we
_still_ want to disallow tree entries that are clearly designed to
exploit NTFS-specific behavior.

This simplification will make subsequent changes easier to implement,
such as turning `core.protectNTFS` on by default (not only on Windows)
or protecting against attack vectors involving NTFS Alternate Data
Streams.

Incidentally, this change allows for catching malicious repositories
that contain tree entries of the form `dir\.gitmodules` already on the
server side rather than only on the client side (and previously only on
Windows): in contrast to `is_ntfs_dotgit()`, the
`is_ntfs_dotgitmodules()` function already expects the caller to split
the paths by directory separators.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
2019-12-05 15:36:50 +01:00
Jeff King b2f2039c2b fsck: accept an oid instead of a "struct tree" for fsck_tree()
We don't actually look at the tree struct in fsck_tree() beyond its oid
and type (which is obviously OBJ_TREE). Just taking an oid gives our
callers more flexibility to avoid creating a struct, and makes it clear
that we are fscking just what is in the buffer, not any pre-parsed bits
from the struct.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King c5b4269b57 fsck: accept an oid instead of a "struct commit" for fsck_commit()
We don't actually look at the commit struct in fsck_commit() beyond its
oid and type (which is obviously OBJ_COMMIT). Just taking an oid gives
our callers more flexibility to avoid creating or parsing a struct, and
makes it clear that we are fscking just what is in the buffer, not any
pre-parsed bits from the struct.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King 103fb6d43b fsck: accept an oid instead of a "struct tag" for fsck_tag()
We don't actually look at the tag struct in fsck_tag() beyond its oid
and type (which is obviously OBJ_TAG). Just taking an oid gives our
callers more flexibility to avoid creating or parsing a struct, and
makes it clear that we are fscking just what is in the buffer, not any
pre-parsed bits from the struct.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King f648ee7088 fsck: rename vague "oid" local variables
In fsck_commit() and fsck_tag(), we have local "oid" variables used for
parsing parent and tagged-object oids. Let's give these more specific
names in preparation for the functions taking an "oid" parameter for the
object we're checking.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King cc579000bf fsck: don't require an object struct in verify_headers()
We only need the oid and type to pass on to report(). Let's accept the
broken-out parameters to give our callers more flexibility.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King 7854399366 fsck: don't require an object struct for fsck_ident()
The only thing we do with the struct is pass its oid and type to
report(). We can just take those explicitly, which gives our callers
more flexibility.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King b8b00f1693 fsck: drop blob struct from fsck_finish()
Since fsck_blob() no longer requires us to have a "struct blob", we
don't need to create one. Which also means we don't need to worry about
handling the case that lookup_blob() returns NULL (we'll still catch
wrongly-identified blobs when we read the actual object contents and
type from disk).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King 6da40b22ca fsck: accept an oid instead of a "struct blob" for fsck_blob()
We don't actually need any information from the object struct except its
oid (and the type, of course, but that's implicitly OBJ_BLOB). This
gives our callers more flexibility to drop the object structs, too.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King 38370253fd fsck: don't require an object struct for report()
The report() function really only cares about the oid and type of the
object, not the full object struct. Let's convert it to take those two
items separately, which gives our callers more flexibility.

This makes some already-long lines even longer. I've mostly left them,
as our eventual goal is to shrink these down as we continue refactoring
(e.g., "&item->object" becomes "&item->object.oid, item->object.type",
but will eventually shrink down to "oid, type").

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King f59793763d fsck: only require an oid for skiplist functions
The skiplist is inherently an oidset, so we don't need a full object
struct. Let's take just the oid to give our callers more flexibility.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King 5afc4b1dc6 fsck: only provide oid/type in fsck_error callback
None of the callbacks actually care about having a "struct object";
they're happy with just the oid and type information. So let's give
ourselves more flexibility to avoid having a "struct object" by just
passing the broken-down fields.

Note that the callback already takes a "type" field for the fsck message
type. We'll rename that to "msg_type" (and use "object_type" for the
object type) to make the distinction explicit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King 733902905d fsck: use oids rather than objects for object_name API
We don't actually care about having object structs; we only need to look
up decorations by oid. Let's accept this more limited form, which will
give our callers more flexibility.

Note that the decoration API we rely on uses object structs itself (even
though it only looks at their oids). We can solve this by switching to
a kh_oid_map (we could also use the hashmap oidmap, but it's more
awkward for the simple case of just storing a void pointer).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:18 +09:00
Jeff King d40bbc109b fsck_describe_object(): build on our get_object_name() primitive
This isolates the implementation detail of using the decoration code to
our put/get functions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:17 +09:00
Jeff King a59cfb3230 fsck: unify object-name code
Commit 90cf590f53 (fsck: optionally show more helpful info for broken
links, 2016-07-17) added a system for decorating objects with names. The
code is split across builtin/fsck.c (which gives the initial names) and
fsck.c (which adds to the names as it traverses the object graph). This
leads to some duplication, where both sites have near-identical
describe_object() functions (the difference being that the one in
builtin/fsck.c uses a circular array of buffers to allow multiple calls
in a single printf).

Let's provide a unified object_name API for fsck. That lets us drop the
duplication, as well as making the interface boundaries more clear
(which will let us refactor the implementation more in a future patch).

We'll leave describe_object() in builtin/fsck.c as a thin wrapper around
the new API, as it relies on a static global to make its many callers a
bit shorter.

We'll also convert the bare add_decoration() calls in builtin/fsck.c to
put_object_name(). This fixes two minor bugs:

  1. We leak many small strings. add_decoration() has a last-one-wins
     approach: it updates the decoration to the new string and returns
     the old one. But we ignore the return value, leaking the old
     string. This is quite common to trigger, since we look at reflogs:
     the tip of any ref will be described both by looking at the actual
     ref, as well as the latest reflog entry. So we'd always end up
     leaking one of those strings.

  2. The last-one-wins approach gives us lousy names. For instance, we
     first look at all of the refs, and then all of the reflogs. So
     rather than seeing "refs/heads/master", we're likely to overwrite
     it with "HEAD@{12345678}". We're generally better off using the
     first name we find.

     And indeed, the test in t1450 expects this ugly HEAD@{} name. After
     this patch, we've switched to using fsck_put_object_name()'s
     first-one-wins semantics, and we output the more human-friendly
     "refs/tags/julius" (and the test is updated accordingly).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:17 +09:00
Jeff King 23a173a761 fsck: require an actual buffer for non-blobs
The fsck_object() function takes in a buffer, but also a "struct
object". The rules for using these vary between types:

  - for a commit, we'll use the provided buffer; if it's NULL, we'll
    fall back to get_commit_buffer(), which loads from either an
    in-memory cache or from disk. If the latter fails, we'd die(), which
    is non-ideal for fsck.

  - for a tag, a NULL buffer will fall back to loading the object from
    disk (and failure would lead to an fsck error)

  - for a tree, we _never_ look at the provided buffer, and always use
    tree->buffer

  - for a blob, we usually don't look at the buffer at all, unless it
    has been marked as a .gitmodule file. In that case we check the
    buffer given to us, or assume a NULL buffer is a very large blob
    (and complain about it)

This is much more complex than it needs to be. It turns out that nobody
ever feeds a NULL buffer that isn't a blob:

  - git-fsck calls fsck_object() only from fsck_obj(). That in turn is
    called by one of:

      - fsck_obj_buffer(), which is a callback to verify_pack(), which
	unpacks everything except large blobs into a buffer (see
	pack-check.c, lines 131-141).

      - fsck_loose(), which hits a BUG() on non-blobs with a NULL buffer
	(builtin/fsck.c, lines 639-640)

    And in either case, we'll have just called parse_object_buffer()
    anyway, which would segfault on a NULL buffer for commits or tags
    (not for trees, but it would install a NULL tree->buffer which would
    later cause a segfault)

  - git-index-pack asserts that the buffer is non-NULL unless the object
    is a blob (see builtin/index-pack.c, line 832)

  - git-unpack-objects always writes a non-NULL buffer into its
    obj_buffer hash, which is then fed to fsck_object(). (There is
    actually a funny thing here where it does not store blob buffers at
    all, nor does it call fsck on them; it does check any needed blobs
    via fsck_finish() though).

Let's make the rules simpler, which reduces the amount of code and gives
us more flexibility in refactoring the fsck code. The new rules are:

  - only blobs are allowed to pass a NULL buffer

  - we always use the provided buffer, never pulling information from
    the object struct

We don't have to adjust any callers, because they were already adhering
to these. Note that we do drop a few fsck identifiers for missing tags,
but that was all dead code (because nobody passed a NULL tag buffer).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:17 +09:00
Jeff King 2175a0c601 fsck: stop checking tag->tagged
Way back in 92d4c85d24 (fsck-cache: fix SIGSEGV on bad tag object,
2005-05-03), we added an fsck check that the "tagged" field of a tag
struct isn't NULL. But that was mainly protecting the printing code for
"--tags", and that code wasn't moved along with the check as part of
ba002f3b28 (builtin-fsck: move common object checking code to fsck.c,
2008-02-25).

It could also serve to detect type mismatch problems (where a tag points
to object X as a commit, but really X is a blob), but it couldn't do so
reliably (we'd call lookup_commit(X), but it will only notice the
problem if we happen to have previously called lookup_blob(X) in the
same process). And as of a commit earlier in this series, we'd consider
that a parse error and complain about the object even before getting to
this point anyway.

So let's drop this "tag->tagged" check. It's not helping anything, and
getting rid of it makes the function conceptually cleaner, as it really
is just checking the buffer we feed it. In fact, we can get rid of our
one-line wrapper and just unify fsck_tag() and fsck_tag_buffer().

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:17 +09:00
Jeff King ec65231571 fsck: stop checking commit->parent counts
In 4516338243 (builtin-fsck: reports missing parent commits,
2008-02-25), we added code to check that fsck found the same number of
parents from parsing the commit itself as we see in the commit struct we
got from parse_commit_buffer(). Back then the rationale was that the
normal commit parser might skip some bad parents.

But earlier in this series, we started treating that reliably as a
parsing error, meaning that we'd complain about it before we even hit
the code in fsck.c.

Let's drop this code, which now makes fsck_commit_buffer() completely
independent of any parsed values in the commit struct (that's
conceptually cleaner, and also opens up more refactoring options).

Note that we can also drop the MISSING_PARENT and MISSING_GRAFT fsck
identifiers. This is no loss, as these would not trigger reliably
anyway.  We'd hit them only when lookup_commit() failed, which occurs
only if we happen to have seen the object with another type already in
the same process. In most cases, we'd actually run into the problem
during the connectivity walk, not here.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:17 +09:00
Jeff King 1de6007d85 fsck: stop checking commit->tree value
We check in fsck_commit_buffer() that commit->tree isn't NULL, which in
turn generally comes from a previous parse by parse_commit(). But this
isn't really accomplishing anything. The two things we might care about
are:

  - was there a syntactically valid "tree <oid>" line in the object? But
    we've just done our own parse in fsck_commit_buffer() to check this.

  - does it point to a valid tree object? But checking the "tree"
    pointer here doesn't actually accomplish that; it just shows that
    lookup_tree() didn't return NULL, which only means that we haven't
    yet seen that oid as a non-tree in this process.

    A real connectivity check would exhaustively walk all graph links,
    and we do that already in a separate function.

So this code isn't helping anything. And it makes the fsck code slightly
more confusing and rigid (e.g., it requires that any commit structs have
already been parsed). Let's drop it.

As a bit of history, the presence of this code looks like a leftover
from early fsck code (which did rely on parse_commit() to do most of the
parsing). The check comes from ff5ebe39b0 (Port fsck-cache to use
parsing functions, 2005-04-18), but we later added an explicit walk in
355885d531 (add generic, type aware object chain walker, 2008-02-25).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-28 14:05:17 +09:00
Junio C Hamano 209f075593 Merge branch 'br/blame-ignore'
"git blame" learned to "ignore" commits in the history, whose
effects (as well as their presence) get ignored.

* br/blame-ignore:
  t8014: remove unnecessary braces
  blame: drop some unused function parameters
  blame: add a test to cover blame_coalesce()
  blame: use the fingerprint heuristic to match ignored lines
  blame: add a fingerprint heuristic to match ignored lines
  blame: optionally track line fingerprints during fill_blame_origin()
  blame: add config options for the output of ignored or unblamable lines
  blame: add the ability to ignore commits and their changes
  blame: use a helper function in blame_chunk()
  Move oidset_parse_file() to oidset.c
  fsck: rename and touch up init_skiplist()
2019-07-19 11:30:20 -07:00
Jeff King 0ebbcf70e6 object: convert lookup_unknown_object() to use object_id
There are no callers left of lookup_unknown_object() that aren't just
passing us the "hash" member of a "struct object_id". Let's take the
whole struct, which gets us closer to removing all raw sha1 variables.
It also matches the existing conversions of lookup_blob(), etc.

The conversions of callers were done by hand, but they're all mechanical
one-liners.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 10:06:19 -07:00
Barret Rhoden f93895f8fc Move oidset_parse_file() to oidset.c
Signed-off-by: Barret Rhoden <brho@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-16 11:36:23 +09:00
Barret Rhoden 24eb33ebc5 fsck: rename and touch up init_skiplist()
init_skiplist() took a file consisting of SHA-1s and comments and added
the objects to an oidset.  This functionality is useful for other
commands and will be moved to oidset.c in a future commit.

In preparation for that move, this commit renames it to
oidset_parse_file() to reflect its more generic usage and cleans up a
few of the names.

Signed-off-by: Barret Rhoden <brho@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-16 11:36:23 +09:00
Elijah Newren 5ec1e72823 Use 'unsigned short' for mode, like diff_filespec does
struct diff_filespec defines mode to be an 'unsigned short'.  Several
other places in the API which we'd like to interact with using a
diff_filespec used a plain unsigned (or unsigned int).  This caused
problems when taking addresses, so switch to unsigned short.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08 16:02:07 +09:00
brian m. carlson ea82b2a085 tree-walk: store object_id in a separate member
When parsing a tree, we read the object ID directly out of the tree
buffer. This is normally fine, but such an object ID cannot be used with
oidcpy, which copies GIT_MAX_RAWSZ bytes, because if we are using SHA-1,
there may not be that many bytes to copy.

Instead, store the object ID in a separate struct member. Since we can
no longer efficiently compute the path length, store that information as
well in struct name_entry. Ensure we only copy the object ID into the
new buffer if the path length is nonzero, as some callers will pass us
an empty path with no object ID following it, and we will not want to
read past the end of the buffer.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-15 09:57:41 -08:00
Junio C Hamano da3e0752cd Merge branch 'jc/cocci-preincr'
Code cleanup.

* jc/cocci-preincr:
  fsck: s/++i > 1/i++/
  cocci: simplify "if (++u > 1)" to "if (u++)"
2018-10-30 15:43:48 +09:00
Junio C Hamano b84c783882 fsck: s/++i > 1/i++/
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-24 10:25:12 +09:00
Junio C Hamano 66ec2373fe Merge branch 'ab/fsck-skiplist'
Update fsck.skipList implementation and documentation.

* ab/fsck-skiplist:
  fsck: support comments & empty lines in skipList
  fsck: use oidset instead of oid_array for skipList
  fsck: use strbuf_getline() to read skiplist file
  fsck: add a performance test for skipList
  fsck: add a performance test
  fsck: document that skipList input must be unabbreviated
  fsck: document and test commented & empty line skipList input
  fsck: document and test sorted skipList input
  fsck tests: add a test for no skipList input
  fsck tests: setup of bogus commit object
2018-10-10 12:37:16 +09:00
Junio C Hamano 1958ad504b Sync with 2.18.1
* maint-2.18:
  Git 2.18.1
  Git 2.17.2
  fsck: detect submodule paths starting with dash
  fsck: detect submodule urls starting with dash
  Git 2.16.5
  Git 2.15.3
  Git 2.14.5
  submodule-config: ban submodule paths that start with a dash
  submodule-config: ban submodule urls that start with dash
  submodule--helper: use "--" to signal end of clone options
2018-09-27 11:50:45 -07:00
Junio C Hamano 44f87dac99 Sync with 2.17.2
* maint-2.17:
  Git 2.17.2
  fsck: detect submodule paths starting with dash
  fsck: detect submodule urls starting with dash
  Git 2.16.5
  Git 2.15.3
  Git 2.14.5
  submodule-config: ban submodule paths that start with a dash
  submodule-config: ban submodule urls that start with dash
  submodule--helper: use "--" to signal end of clone options
2018-09-27 11:45:01 -07:00
Jeff King 1a7fd1fb29 fsck: detect submodule paths starting with dash
As with urls, submodule paths with dashes are ignored by
git, but may end up confusing older versions. Detecting them
via fsck lets us prevent modern versions of git from being a
vector to spread broken .gitmodules to older versions.

Compared to blocking leading-dash urls, though, this
detection may be less of a good idea:

  1. While such paths provide confusing and broken results,
     they don't seem to actually work as option injections
     against anything except "cd". In particular, the
     submodule code seems to canonicalize to an absolute
     path before running "git clone" (so it passes
     /your/clone/-sub).

  2. It's more likely that we may one day make such names
     actually work correctly. Even after we revert this fsck
     check, it will continue to be a hassle until hosting
     servers are all updated.

On the other hand, it's not entirely clear that the behavior
in older versions is safe. And if we do want to eventually
allow this, we may end up doing so with a special syntax
anyway (e.g., writing "./-sub" in the .gitmodules file, and
teaching the submodule code to canonicalize it when
comparing).

So on balance, this is probably a good protection.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-27 11:41:31 -07:00
Jeff King a124133e1e fsck: detect submodule urls starting with dash
Urls with leading dashes can cause mischief on older
versions of Git. We should detect them so that they can be
rejected by receive.fsckObjects, preventing modern versions
of git from being a vector by which attacks can spread.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-27 11:41:26 -07:00
Ævar Arnfjörð Bjarmason 371a655074 fsck: support comments & empty lines in skipList
It's annoying not to be able to put comments and empty lines in the
skipList, when e.g. keeping a big central list of commits to skip in
/etc/gitconfig, which was my motivation for 1362df0d41 ("fetch:
implement fetch.fsck.*", 2018-07-27).

Implement that, and document what version of Git this was changed in,
since this on-disk format can be expected to be used by multiple
versions of git.

There is no notable performance impact from this change, using the
test setup described a couple of commits back:

    Test                                             HEAD~             HEAD
    ----------------------------------------------------------------------------------------
    1450.3: fsck with 0 skipped bad commits          7.69(7.27+0.42)   7.86(7.48+0.37) +2.2%
    1450.5: fsck with 1 skipped bad commits          7.69(7.30+0.38)   7.83(7.47+0.36) +1.8%
    1450.7: fsck with 10 skipped bad commits         7.76(7.38+0.38)   7.79(7.38+0.41) +0.4%
    1450.9: fsck with 100 skipped bad commits        7.76(7.38+0.38)   7.74(7.36+0.38) -0.3%
    1450.11: fsck with 1000 skipped bad commits      7.71(7.30+0.41)   7.72(7.34+0.38) +0.1%
    1450.13: fsck with 10000 skipped bad commits     7.74(7.34+0.40)   7.72(7.34+0.38) -0.3%
    1450.15: fsck with 100000 skipped bad commits    7.75(7.40+0.35)   7.70(7.29+0.40) -0.6%
    1450.17: fsck with 1000000 skipped bad commits   7.12(6.86+0.26)   7.13(6.87+0.26) +0.1%

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-12 15:17:46 -07:00
René Scharfe 3b41fb0cb2 fsck: use oidset instead of oid_array for skipList
Change the implementation of the skipList feature to use oidset
instead of oid_array to store SHA-1s for later lookup.

This list is parsed once on startup by fsck, fetch-pack or
receive-pack depending on the *.skipList config in use. I.e. only once
per invocation, but note that for "clone --recurse-submodules" each
submodule will re-parse the list, in addition to the main project, and
it will be re-parsed when checking .gitmodules blobs, see
fb16287719 ("fsck: check skiplist for object in fsck_blob()",
2018-06-27).

Memory usage is a bit higher, but we don't need to keep track of the
sort order anymore. Embed the oidset into struct fsck_options to make
its ownership clear (no hidden sharing) and avoid unnecessary pointer
indirection.

The cumulative impact on performance of this & the preceding change,
using the test setup described in the previous commit:

    Test                                             HEAD~2            HEAD~                   HEAD
    ----------------------------------------------------------------------------------------------------------------
    1450.3: fsck with 0 skipped bad commits          7.70(7.31+0.38)   7.72(7.33+0.38) +0.3%   7.70(7.30+0.40) +0.0%
    1450.5: fsck with 1 skipped bad commits          7.84(7.47+0.37)   7.69(7.32+0.36) -1.9%   7.71(7.29+0.41) -1.7%
    1450.7: fsck with 10 skipped bad commits         7.81(7.40+0.40)   7.94(7.57+0.36) +1.7%   7.92(7.55+0.37) +1.4%
    1450.9: fsck with 100 skipped bad commits        7.81(7.42+0.38)   7.95(7.53+0.41) +1.8%   7.83(7.42+0.41) +0.3%
    1450.11: fsck with 1000 skipped bad commits      7.99(7.62+0.36)   7.90(7.50+0.40) -1.1%   7.86(7.49+0.37) -1.6%
    1450.13: fsck with 10000 skipped bad commits     7.98(7.57+0.40)   7.94(7.53+0.40) -0.5%   7.90(7.45+0.44) -1.0%
    1450.15: fsck with 100000 skipped bad commits    7.97(7.57+0.39)   8.03(7.67+0.36) +0.8%   7.84(7.43+0.41) -1.6%
    1450.17: fsck with 1000000 skipped bad commits   7.72(7.22+0.50)   7.28(7.07+0.20) -5.7%   7.13(6.87+0.25) -7.6%

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-12 15:17:46 -07:00
René Scharfe fb8952077d fsck: use strbuf_getline() to read skiplist file
The buffer is unlikely to contain a NUL character, so printing its
contents using %s in a die() format is unsafe (detected with ASan).

Use an idiomatic strbuf_getline() loop instead, which ensures the buffer
is always NUL-terminated, supports CRLF files as well, accepts files
without a newline after the last line, supports any hash length
automatically, and is shorter.

This fixes a bug where emitting an error about an invalid line on say
line 1 would continue printing subsequent lines, and usually continue
into uninitialized memory.

The performance impact of this, on a CentOS 7 box with RedHat GCC
4.8.5-28:

    $ GIT_PERF_REPEAT_COUNT=5 GIT_PERF_MAKE_OPTS='-j56 CFLAGS="-O3"' ./run HEAD~ HEAD p1451-fsck-skip-list.sh
    Test                                             HEAD~             HEAD
    ----------------------------------------------------------------------------------------
    1450.3: fsck with 0 skipped bad commits          7.75(7.39+0.35)   7.68(7.29+0.39) -0.9%
    1450.5: fsck with 1 skipped bad commits          7.70(7.30+0.40)   7.80(7.42+0.37) +1.3%
    1450.7: fsck with 10 skipped bad commits         7.77(7.37+0.40)   7.87(7.47+0.40) +1.3%
    1450.9: fsck with 100 skipped bad commits        7.82(7.41+0.40)   7.88(7.43+0.44) +0.8%
    1450.11: fsck with 1000 skipped bad commits      7.88(7.49+0.39)   7.84(7.43+0.40) -0.5%
    1450.13: fsck with 10000 skipped bad commits     8.02(7.63+0.39)   8.07(7.67+0.39) +0.6%
    1450.15: fsck with 100000 skipped bad commits    8.01(7.60+0.41)   8.08(7.70+0.38) +0.9%
    1450.17: fsck with 1000000 skipped bad commits   7.60(7.10+0.50)   7.37(7.18+0.19) -3.0%

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-12 15:17:46 -07:00
Junio C Hamano 3a2a1dc170 Merge branch 'sb/object-store-lookup'
lookup_commit_reference() and friends have been updated to find
in-core object for a specific in-core repository instance.

* sb/object-store-lookup: (32 commits)
  commit.c: allow lookup_commit_reference to handle arbitrary repositories
  commit.c: allow lookup_commit_reference_gently to handle arbitrary repositories
  tag.c: allow deref_tag to handle arbitrary repositories
  object.c: allow parse_object to handle arbitrary repositories
  object.c: allow parse_object_buffer to handle arbitrary repositories
  commit.c: allow get_cached_commit_buffer to handle arbitrary repositories
  commit.c: allow set_commit_buffer to handle arbitrary repositories
  commit.c: migrate the commit buffer to the parsed object store
  commit-slabs: remove realloc counter outside of slab struct
  commit.c: allow parse_commit_buffer to handle arbitrary repositories
  tag: allow parse_tag_buffer to handle arbitrary repositories
  tag: allow lookup_tag to handle arbitrary repositories
  commit: allow lookup_commit to handle arbitrary repositories
  tree: allow lookup_tree to handle arbitrary repositories
  blob: allow lookup_blob to handle arbitrary repositories
  object: allow lookup_object to handle arbitrary repositories
  object: allow object_as_type to handle arbitrary repositories
  tag: add repository argument to deref_tag
  tag: add repository argument to parse_tag_buffer
  tag: add repository argument to lookup_tag
  ...
2018-08-02 15:30:42 -07:00
Junio C Hamano bd1a32d5c8 Merge branch 'jk/fsck-gitmodules-gently'
Recent "security fix" to pay attention to contents of ".gitmodules"
while accepting "git push" was a bit overly strict than necessary,
which has been adjusted.

* jk/fsck-gitmodules-gently:
  fsck: downgrade gitmodulesParse default to "info"
  fsck: split ".gitmodules too large" error from parse failure
  fsck: silence stderr when parsing .gitmodules
  config: add options parameter to git_config_from_mem
  config: add CONFIG_ERROR_SILENT handler
  config: turn die_on_error into caller-facing enum
2018-08-02 15:30:39 -07:00
Junio C Hamano a9e7fe96cc Merge branch 'rj/submodule-fsck-skip'
"fsck.skipList" did not prevent a blob object listed there from
being inspected for is contents (e.g. we recently started to
inspect the contents of ".gitmodules" for certain malicious
patterns), which has been corrected.

* rj/submodule-fsck-skip:
  fsck: check skiplist for object in fsck_blob()
2018-07-24 14:50:42 -07:00
Junio C Hamano 00624d608c Merge branch 'sb/object-store-grafts'
The conversion to pass "the_repository" and then "a_repository"
throughout the object access API continues.

* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-07-18 12:20:28 -07:00
Jeff King 64eb14d310 fsck: downgrade gitmodulesParse default to "info"
We added an fsck check in ed8b10f631 (fsck: check
.gitmodules content, 2018-05-02) as a defense against the
vulnerability from 0383bbb901 (submodule-config: verify
submodule names as paths, 2018-04-30). With the idea that
up-to-date hosting sites could protect downstream unpatched
clients that fetch from them.

As part of that defense, we reject any ".gitmodules" entry
that is not syntactically valid. The theory is that if we
cannot even parse the file, we cannot accurately check it
for vulnerabilities. And anybody with a broken .gitmodules
file would eventually want to know anyway.

But there are a few reasons this is a bad tradeoff in
practice:

 - for this particular vulnerability, the client has to be
   able to parse the file. So you cannot sneak an attack
   through using a broken file, assuming the config parsers
   for the process running fsck and the eventual victim are
   functionally equivalent.

 - a broken .gitmodules file is not necessarily a problem.
   Our fsck check detects .gitmodules in _any_ tree, not
   just at the root. And the presence of a .gitmodules file
   does not necessarily mean it will be used; you'd have to
   also have gitlinks in the tree. The cgit repository, for
   example, has a file named .gitmodules from a
   pre-submodule attempt at sharing code, but does not
   actually have any gitlinks.

 - when the fsck check is used to reject a push, it's often
   hard to work around. The pusher may not have full control
   over the destination repository (e.g., if it's on a
   hosting server, they may need to contact the hosting
   site's support). And the broken .gitmodules may be too
   far back in history for rewriting to be feasible (again,
   this is an issue for cgit).

So we're being unnecessarily restrictive without actually
improving the security in a meaningful way. It would be more
convenient to downgrade this check to "info", which means
we'd still comment on it, but not reject a push. Site admins
can already do this via config, but we should ship sensible
defaults.

There are a few counterpoints to consider in favor of
keeping the check as an error:

 - the first point above assumes that the config parsers for
   the victim and the fsck process are equivalent. This is
   pretty true now, but as time goes on will become less so.
   Hosting sites are likely to upgrade their version of Git,
   whereas vulnerable clients will be stagnant (if they did
   upgrade, they'd cease to be vulnerable!). So in theory we
   may see drift over time between what two config parsers
   will accept.

   In practice, this is probably OK. The config format is
   pretty established at this point and shouldn't change a
   lot. And the farther we get from the announcement of the
   vulnerability, the less interesting this extra layer of
   protection becomes. I.e., it was _most_ valuable on day
   0, when everybody's client was still vulnerable and
   hosting sites could protect people. But as time goes on
   and people upgrade, the population of vulnerable clients
   becomes smaller and smaller.

 - In theory this could protect us from other
   vulnerabilities in the future. E.g., .gitmodules are the
   only way for a malicious repository to feed data to the
   config parser, so this check could similarly protect
   clients from a future (to-be-found) bug there.

   But that's trading a hypothetical case for real-world
   pain today. If we do find such a bug, the hosting site
   would need to be updated to fix it, too. At which point
   we could figure out whether it's possible to detect
   _just_ the malicious case without hurting existing
   broken-but-not-evil cases.

 - Until recently, we hadn't made any restrictions on
   .gitmodules content. So now in tightening that we're
   hitting cases where certain things used to work, but
   don't anymore. There's some moderate pain now. But as
   time goes on, we'll see more (and more varied) cases that
   will make tightening harder in the future. So there's
   some argument for putting rules in place _now_, before
   users grow more cases that violate them.

   Again, this is trading pain now for hypothetical benefit
   in the future. And if we try hard in the future to keep
   our tightening to a minimum (i.e., rejecting true
   maliciousness without hurting broken-but-not-evil repos),
   then that reduces even the hypothetical benefit.

Considering both sets of arguments, it makes sense to loosen
this check for now.

Note that we have to tweak the test in t7415 since fsck will
no longer consider this a fatal error. But we still check
that it reports the warning, and that we don't get the
spurious error from the config code.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-16 10:57:23 -07:00
Jeff King 0d68764d94 fsck: split ".gitmodules too large" error from parse failure
Since ed8b10f631 (fsck: check .gitmodules content,
2018-05-02), we'll report a gitmodulesParse error for two
conditions:

  - a .gitmodules entry is not syntactically valid

  - a .gitmodules entry is larger than core.bigFileThreshold

with the intent that we can detect malicious files and
protect downstream clients. E.g., from the issue in
0383bbb901 (submodule-config: verify submodule names as
paths, 2018-04-30).

But these conditions are actually quite different with
respect to that bug:

 - a syntactically invalid file cannot trigger the problem,
   as the victim would barf before hitting the problematic
   code

 - a too-big .gitmodules _can_ trigger the problem. Even
   though it is obviously silly to have a 500MB .gitmodules
   file, the submodule code will happily parse it if you
   have enough memory.

So it may be reasonable to configure their severity
separately. Let's add a new class for the "too large" case
to allow that.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-16 10:57:22 -07:00
Ramsay Jones fb16287719 fsck: check skiplist for object in fsck_blob()
Since commit ed8b10f631 ("fsck: check .gitmodules content", 2018-05-02),
fsck will issue an error message for '.gitmodules' content that cannot
be parsed correctly. This is the case, even when the corresponding blob
object has been included on the skiplist. For example, using the cgit
repository, we see the following:

  $ git fsck
  Checking object directories: 100% (256/256), done.
  error: bad config line 5 in blob .gitmodules
  error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: could not parse gitmodules blob
  Checking objects: 100% (6626/6626), done.
  $

  $ git config fsck.skiplist '.git/skip'
  $ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip
  $

  $ git fsck
  Checking object directories: 100% (256/256), done.
  error: bad config line 5 in blob .gitmodules
  Checking objects: 100% (6626/6626), done.
  $

Note that the error message issued by the config parser is still
present, despite adding the object-id of the blob to the skiplist.

One solution would be to provide a means of suppressing the messages
issued by the config parser. However, given that (logically) we are
asking fsck to ignore this object, a simpler approach is to just not
call the config parser if the object is to be skipped. Add a check to
the 'fsck_blob()' processing function, to determine if the object is
on the skiplist and, if so, exit the function early.

Signed-off-by: Ramsay Jones <ramsay@ramsayjones.plus.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 09:49:44 -07:00
Jeff King de6bd9e3ea fsck: silence stderr when parsing .gitmodules
If there's a parsing error we'll already report it via the
usual fsck report() function (or not, if the user has asked
to skip this object or warning type). The error message from
the config parser just adds confusion. Let's suppress it.

Note that we didn't test this case at all, so I've added
coverage in t7415. We may end up toning down or removing
this fsck check in the future. So take this test as checking
what happens now with a focus on stderr, and not any
ironclad guarantee that we must detect and report parse
failures in the future.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 09:36:41 -07:00
Jeff King 4574f1aace config: add options parameter to git_config_from_mem
The underlying config parser knows how to handle a
config_options struct, but git_config_from_mem() always
passes NULL. Let's allow our callers to specify the options
struct.

We could add a "_with_options" variant, but since there are
only a handful of callers, let's just update them to pass
NULL.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-03 09:36:06 -07:00
Stefan Beller f86bcc7b2c tree: add repository argument to lookup_tree
Add a repository argument to allow the callers of lookup_tree
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Stefan Beller da14a7ff99 blob: add repository argument to lookup_blob
Add a repository argument to allow the callers of lookup_blob
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Stefan Beller 109cd76dd3 object: add repository argument to parse_object
Add a repository argument to allow the callers of parse_object
to be more specific about which repository to act on. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-29 10:43:38 -07:00
Junio C Hamano b16b60f71b Merge branch 'sb/object-store-grafts' into sb/object-store-lookup
* sb/object-store-grafts:
  commit: allow lookup_commit_graft to handle arbitrary repositories
  commit: allow prepare_commit_graft to handle arbitrary repositories
  shallow: migrate shallow information into the object parser
  path.c: migrate global git_path_* to take a repository argument
  cache: convert get_graft_file to handle arbitrary repositories
  commit: convert read_graft_file to handle arbitrary repositories
  commit: convert register_commit_graft to handle arbitrary repositories
  commit: convert commit_graft_pos() to handle arbitrary repositories
  shallow: add repository argument to is_repository_shallow
  shallow: add repository argument to check_shallow_file_for_update
  shallow: add repository argument to register_shallow
  shallow: add repository argument to set_alternate_shallow_file
  commit: add repository argument to lookup_commit_graft
  commit: add repository argument to prepare_commit_graft
  commit: add repository argument to read_graft_file
  commit: add repository argument to register_commit_graft
  commit: add repository argument to commit_graft_pos
  object: move grafts to object parser
  object-store: move object access functions to object-store.h
2018-06-29 10:43:28 -07:00
Junio C Hamano ebaf0a56f3 Merge branch 'nd/complete-config-vars'
Continuing with the idea to programatically enumerate various
pieces of data required for command line completion, teach the
codebase to report the list of configuration variables
subcommands care about to help complete them.

* nd/complete-config-vars:
  completion: complete general config vars in two steps
  log-tree: allow to customize 'grafted' color
  completion: support case-insensitive config vars
  completion: keep other config var completion in camelCase
  completion: drop the hard coded list of config vars
  am: move advice.amWorkDir parsing back to advice.c
  advice: keep config name in camelCase in advice_config[]
  fsck: produce camelCase config key names
  help: add --config to list all available config
  fsck: factor out msg_id_info[] lazy initialization code
  grep: keep all colors in an array
  Add and use generic name->id mapping code for color slot parsing
2018-06-25 13:22:38 -07:00
Junio C Hamano fb6ac9e79a Merge branch 'jk/submodule-fsck-loose-fixup'
Finishing touches to a topic that already is in 'maint'.

* jk/submodule-fsck-loose-fixup:
  fsck: avoid looking at NULL blob->object
  t7415: don't bother creating commit for symlink test
2018-06-13 12:50:44 -07:00
Jeff King 47cc91310a fsck: avoid looking at NULL blob->object
Commit 159e7b080b (fsck: detect gitmodules files,
2018-05-02) taught fsck to look at the content of
.gitmodules files. If the object turns out not to be a blob
at all, we just complain and punt on checking the content.
And since this was such an obvious and trivial code path, I
didn't even bother to add a test.

Except it _does_ do one non-trivial thing, which is call the
report() function, which wants us to pass a pointer to a
"struct object". Which we don't have (we have only a "struct
object_id"). So we erroneously pass a NULL object to
report(), which gets dereferenced and causes a segfault.

It seems like we could refactor report() to just take the
object_id itself. But we pass the object pointer along to
a callback function, and indeed this ends up in
builtin/fsck.c's objreport() which does want to look at
other parts of the object (like the type).

So instead, let's just use lookup_unknown_object() to get
the real "struct object", and pass that.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-11 10:56:06 -07:00
Junio C Hamano 42c8ce1c49 Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (42 commits)
  merge-one-file: compute empty blob object ID
  add--interactive: compute the empty tree value
  Update shell scripts to compute empty tree object ID
  sha1_file: only expose empty object constants through git_hash_algo
  dir: use the_hash_algo for empty blob object ID
  sequencer: use the_hash_algo for empty tree object ID
  cache-tree: use is_empty_tree_oid
  sha1_file: convert cached object code to struct object_id
  builtin/reset: convert use of EMPTY_TREE_SHA1_BIN
  builtin/receive-pack: convert one use of EMPTY_TREE_SHA1_HEX
  wt-status: convert two uses of EMPTY_TREE_SHA1_HEX
  submodule: convert several uses of EMPTY_TREE_SHA1_HEX
  sequencer: convert one use of EMPTY_TREE_SHA1_HEX
  merge: convert empty tree constant to the_hash_algo
  builtin/merge: switch tree functions to use object_id
  builtin/am: convert uses of EMPTY_TREE_SHA1_BIN to the_hash_algo
  sha1-file: add functions for hex empty tree and blob OIDs
  builtin/receive-pack: avoid hard-coded constants for push certs
  diff: specify abbreviation size in terms of the_hash_algo
  upload-pack: replace use of several hard-coded constants
  ...
2018-05-30 14:04:10 +09:00
Junio C Hamano 7913f53b56 Sync with Git 2.17.1
* maint: (25 commits)
  Git 2.17.1
  Git 2.16.4
  Git 2.15.2
  Git 2.14.4
  Git 2.13.7
  fsck: complain when .gitmodules is a symlink
  index-pack: check .gitmodules files with --strict
  unpack-objects: call fsck_finish() after fscking objects
  fsck: call fsck_finish() after fscking objects
  fsck: check .gitmodules content
  fsck: handle promisor objects in .gitmodules check
  fsck: detect gitmodules files
  fsck: actually fsck blob data
  fsck: simplify ".git" check
  index-pack: make fsck error message more specific
  verify_path: disallow symlinks in .gitmodules
  update-index: stat updated files earlier
  verify_dotfile: mention case-insensitivity in comment
  verify_path: drop clever fallthrough
  skip_prefix: add case-insensitive variant
  ...
2018-05-29 17:10:05 +09:00
Nguyễn Thái Ngọc Duy a4a9cc19a2 fsck: produce camelCase config key names
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-29 14:51:28 +09:00
Nguyễn Thái Ngọc Duy 3ac68a93fd help: add --config to list all available config
Sometimes it helps to list all available config vars so the user can
search for something they want. The config man page can also be used
but it's harder to search if you want to focus on the variable name,
for example.

This is not the best way to collect the available config since it's
not precise. Ideally we should have a centralized list of config in C
code (pretty much like 'struct option'), but that's a lot more work.
This will do for now.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-29 14:51:28 +09:00
Nguyễn Thái Ngọc Duy a46baac61e fsck: factor out msg_id_info[] lazy initialization code
This array will be used by some other function than parse_msg_id() in
the following commit. Factor out this prep code so it could be called
from that one.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-29 14:51:28 +09:00
Junio C Hamano c89b6e136e Merge branch 'ds/lazy-load-trees'
The code has been taught to use the duplicated information stored
in the commit-graph file to learn the tree object name for a commit
to avoid opening and parsing the commit object when it makes sense
to do so.

* ds/lazy-load-trees:
  coccinelle: avoid wrong transformation suggestions from commit.cocci
  commit-graph: lazy-load trees for commits
  treewide: replace maybe_tree with accessor methods
  commit: create get_commit_tree() method
  treewide: rename tree to maybe_tree
2018-05-23 14:38:13 +09:00
Jeff King b7b1fca175 fsck: complain when .gitmodules is a symlink
We've recently forbidden .gitmodules to be a symlink in
verify_path(). And it's an easy way to circumvent our fsck
checks for .gitmodules content. So let's complain when we
see it.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King ed8b10f631 fsck: check .gitmodules content
This patch detects and blocks submodule names which do not
match the policy set forth in submodule-config. These should
already be caught by the submodule code itself, but putting
the check here means that newer versions of Git can protect
older ones from malicious entries (e.g., a server with
receive.fsckObjects will block the objects, protecting
clients which fetch from it).

As a side effect, this means fsck will also complain about
.gitmodules files that cannot be parsed (or were larger than
core.bigFileThreshold).

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King 2738744426 fsck: handle promisor objects in .gitmodules check
If we have a tree that points to a .gitmodules blob but
don't have that blob, we can't check its contents. This
produces an fsck error when we encounter it.

But in the case of a promisor object, this absence is
expected, and we must not complain.  Note that this can
technically circumvent our transfer.fsckObjects check.
Imagine a client fetches a tree, but not the matching
.gitmodules blob. An fsck of the incoming objects will show
that we don't have enough information. Later, we do fetch
the actual blob. But we have no idea that it's a .gitmodules
file.

The only ways to get around this would be to re-scan all of
the existing trees whenever new ones enter (which is
expensive), or to somehow persist the gitmodules_found set
between fsck runs (which is complicated).

In practice, it's probably OK to ignore the problem. Any
repository which has all of the objects (including the one
serving the promisor packs) can perform the checks. Since
promisor packs are inherently about a hierarchical topology
in which clients rely on upstream repositories, those
upstream repositories can protect all of their downstream
clients from broken objects.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King 159e7b080b fsck: detect gitmodules files
In preparation for performing fsck checks on .gitmodules
files, this commit plumbs in the actual detection of the
files. Note that unlike most other fsck checks, this cannot
be a property of a single object: we must know that the
object is found at a ".gitmodules" path at the root tree of
a commit.

Since the fsck code only sees one object at a time, we have
to mark the related objects to fit the puzzle together. When
we see a commit we mark its tree as a root tree, and when
we see a root tree with a .gitmodules file, we mark the
corresponding blob to be checked.

In an ideal world, we'd check the objects in topological
order: commits followed by trees followed by blobs. In that
case we can avoid ever loading an object twice, since all
markings would be complete by the time we get to the marked
objects. And indeed, if we are checking a single packfile,
this is the order in which Git will generally write the
objects. But we can't count on that:

  1. git-fsck may show us the objects in arbitrary order
     (loose objects are fed in sha1 order, but we may also
     have multiple packs, and we process each pack fully in
     sequence).

  2. The type ordering is just what git-pack-objects happens
     to write now. The pack format does not require a
     specific order, and it's possible that future versions
     of Git (or a custom version trying to fool official
     Git's fsck checks!) may order it differently.

  3. We may not even be fscking all of the relevant objects
     at once. Consider pushing with transfer.fsckObjects,
     where one push adds a blob at path "foo", and then a
     second push adds the same blob at path ".gitmodules".
     The blob is not part of the second push at all, but we
     need to mark and check it.

So in the general case, we need to make up to three passes
over the objects: once to make sure we've seen all commits,
then once to cover any trees we might have missed, and then
a final pass to cover any .gitmodules blobs we found in the
second pass.

We can simplify things a bit by loosening the requirement
that we find .gitmodules only at root trees. Technically
a file like "subdir/.gitmodules" is not parsed by Git, but
it's not unreasonable for us to declare that Git is aware of
all ".gitmodules" files and make them eligible for checking.
That lets us drop the root-tree requirement, which
eliminates one pass entirely. And it makes our worst case
much better: instead of potentially queueing every root tree
to be re-examined, the worst case is that we queue each
unique .gitmodules blob for a second look.

This patch just adds the boilerplate to find .gitmodules
files. The actual content checks will come in a subsequent
commit.

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King 7ac4f3a007 fsck: actually fsck blob data
Because fscking a blob has always been a noop, we didn't
bother passing around the blob data. In preparation for
content-level checks, let's fix up a few things:

  1. The fsck_object() function just returns success for any
     blob. Let's a noop fsck_blob(), which we can fill in
     with actual logic later.

  2. The fsck_loose() function in builtin/fsck.c
     just threw away blob content after loading it. Let's
     hold onto it until after we've called fsck_object().

     The easiest way to do this is to just drop the
     parse_loose_object() helper entirely. Incidentally,
     this also fixes a memory leak: if we successfully
     loaded the object data but did not parse it, we would
     have left the function without freeing it.

  3. When fsck_loose() loads the object data, it
     does so with a custom read_loose_object() helper. This
     function streams any blobs, regardless of size, under
     the assumption that we're only checking the sha1.

     Instead, let's actually load blobs smaller than
     big_file_threshold, as the normal object-reading
     code-paths would do. This lets us fsck small files, and
     a NULL return is an indication that the blob was so big
     that it needed to be streamed, and we can pass that
     information along to fsck_blob().

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jeff King ed9c322062 fsck: simplify ".git" check
There's no need for us to manually check for ".git"; it's a
subset of the other filesystem-specific tests. Dropping it
makes our code slightly shorter. More importantly, the
existing code may make a reader wonder why ".GIT" is not
covered here, and whether that is a bug (it isn't, as it's
also covered in the filesystem-specific tests).

Signed-off-by: Jeff King <peff@peff.net>
2018-05-21 23:55:12 -04:00
Jonathan Nieder 1f93ecd1ab commit: add repository argument to lookup_commit_graft
Add a repository argument to allow callers of lookup_commit_graft to
be more specific about which repository to handle. This is a small
mechanical change; it doesn't change the implementation to handle
repositories other than the_repository yet.

As with the previous commits, use a macro to catch callers passing a
repository other than the_repository at compile time.

Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-18 08:13:10 +09:00
Stefan Beller cbd53a2193 object-store: move object access functions to object-store.h
This should make these functions easier to find and cache.h less
overwhelming to read.

In particular, this moves:
- read_object_file
- oid_object_info
- write_object_file

As a result, most of the codebase needs to #include object-store.h.
In this patch the #include is only added to files that would fail to
compile otherwise.  It would be better to #include wherever
identifiers from the header are used.  That can happen later
when we have better tooling for it.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-16 11:42:03 +09:00
brian m. carlson c54f5ca970 fsck: convert static functions to struct object_id
Convert two static functions to use struct object_id and parse_oid_hex,
instead of relying on harcoded 20 and 40-based constants.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 13:59:50 +09:00
Derrick Stolee 2e27bd7731 treewide: replace maybe_tree with accessor methods
In anticipation of making trees load lazily, create a Coccinelle
script (contrib/coccinelle/commit.cocci) to ensure that all
references to the 'maybe_tree' member of struct commit are either
mutations or accesses through get_commit_tree() or
get_commit_tree_oid().

Apply the Coccinelle script to create the rest of the patch.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:47:16 +09:00
Derrick Stolee 891435d55d treewide: rename tree to maybe_tree
Using the commit-graph file to walk commit history removes the large
cost of parsing commits during the walk. This exposes a performance
issue: lookup_tree() takes a large portion of the computation time,
even when Git never uses those trees.

In anticipation of lazy-loading these trees, rename the 'tree' member
of struct commit to 'maybe_tree'. This serves two purposes: it hints
at the future role of possibly being NULL even if the commit has a
valid tree, and it allows for unambiguous transformation from simple
member access (i.e. commit->maybe_tree) to method access.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-11 10:47:16 +09:00
brian m. carlson b4f5aca40e sha1_file: convert read_sha1_file to struct object_id
Convert read_sha1_file to take a pointer to struct object_id and rename
it read_object_file.  Do the same for read_sha1_file_extended.

Convert one use in grep.c to use the new function without any other code
change, since the pointer being passed is a void pointer that is already
initialized with a pointer to struct object_id.  Update the declaration
and definitions of the modified functions, and apply the following
semantic patch to convert the remaining callers:

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1.hash, E2, E3)
+ read_object_file(&E1, E2, E3)

@@
expression E1, E2, E3;
@@
- read_sha1_file(E1->hash, E2, E3)
+ read_object_file(E1, E2, E3)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1.hash, E2, E3, E4)
+ read_object_file_extended(&E1, E2, E3, E4)

@@
expression E1, E2, E3, E4;
@@
- read_sha1_file_extended(E1->hash, E2, E3, E4)
+ read_object_file_extended(E1, E2, E3, E4)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-14 09:23:50 -07:00
Brandon Williams debca9d2fe object: rename function 'typename' to 'type_name'
Rename C++ keyword in order to bring the codebase closer to being able
to be compiled with a C++ compiler.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-14 13:10:05 -08:00
Junio C Hamano 6defdc9fe8 Merge branch 'rs/fsck-null-return-from-lookup'
Improve behaviour of "git fsck" upon finding a missing object.

* rs/fsck-null-return-from-lookup:
  fsck: handle NULL return of lookup_blob() and lookup_tree()
2017-10-11 14:52:23 +09:00
René Scharfe 2720f6db5d fsck: handle NULL return of lookup_blob() and lookup_tree()
lookup_blob() and lookup_tree() can return NULL if they find an object
of an unexpected type.  Accessing the object member is undefined in that
case.  Cast the result to a struct object pointer instead; we can do
that because object is the first member of all object types.  This trick
is already used in other places in the code.

An error message is already shown by object_as_type(), which is called
by the lookup functions.  The walk callback functions are expected to
handle NULL object pointers passed to them, but put_object_name() needs
a valid object, so avoid calling it without one.

Suggested-by: SZEDER Gábor <szeder.dev@gmail.com>
Helped-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-10-06 11:04:34 +09:00
Jeff King 1cf01a34ea consistently use "fallthrough" comments in switches
Gcc 7 adds -Wimplicit-fallthrough, which can warn when a
switch case falls through to the next case. The general idea
is that the compiler can't tell if this was intentional or
not, so you should annotate any intentional fall-throughs as
such, leaving it to complain about any unannotated ones.

There's a GNU __attribute__ which can be used for
annotation, but of course we'd have to #ifdef it away on
non-gcc compilers. Gcc will also recognize
specially-formatted comments, which matches our current
practice. Let's extend that practice to all of the
unannotated sites (which I did look over and verify that
they were behaving as intended).

Ideally in each case we'd actually give some reasons in the
comment about why we're falling through, or what we're
falling through to. And gcc does support that with
-Wimplicit-fallthrough=2, which relaxes the comment pattern
matching to anything that contains "fallthrough" (or a
variety of spelling variants). However, this isn't the
default for -Wimplicit-fallthrough, nor for -Wextra. In the
name of simplicity, it's probably better for us to support
the default level, which requires "fallthrough" to be the
only thing in the comment (modulo some window dressing like
"else" and some punctuation; see the gcc manual for the
complete set of patterns).

This patch suppresses all warnings due to
-Wimplicit-fallthrough. We might eventually want to add that
to the DEVELOPER Makefile knob, but we should probably wait
until gcc 7 is more widely adopted (since earlier versions
will complain about the unknown warning type).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-22 12:49:57 +09:00
Stefan Beller 8b65a34c4a commit: convert lookup_commit_graft to struct object_id
With this patch, commit.h doesn't contain the string 'sha1' any more.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-13 12:02:40 -07:00
Junio C Hamano 6b526ced6f Merge branch 'bc/object-id'
Conversion from uchar[20] to struct object_id continues.

* bc/object-id: (53 commits)
  object: convert parse_object* to take struct object_id
  tree: convert parse_tree_indirect to struct object_id
  sequencer: convert do_recursive_merge to struct object_id
  diff-lib: convert do_diff_cache to struct object_id
  builtin/ls-tree: convert to struct object_id
  merge: convert checkout_fast_forward to struct object_id
  sequencer: convert fast_forward_to to struct object_id
  builtin/ls-files: convert overlay_tree_on_cache to object_id
  builtin/read-tree: convert to struct object_id
  sha1_name: convert internals of peel_onion to object_id
  upload-pack: convert remaining parse_object callers to object_id
  revision: convert remaining parse_object callers to object_id
  revision: rename add_pending_sha1 to add_pending_oid
  http-push: convert process_ls_object and descendants to object_id
  refs/files-backend: convert many internals to struct object_id
  refs: convert struct ref_update to use struct object_id
  ref-filter: convert some static functions to struct object_id
  Convert struct ref_array_item to struct object_id
  Convert the verify_pack callback to struct object_id
  Convert lookup_tag to struct object_id
  ...
2017-05-29 12:34:43 +09:00
brian m. carlson c251c83df2 object: convert parse_object* to take struct object_id
Make parse_object, parse_object_or_die, and parse_object_buffer take a
pointer to struct object_id.  Remove the temporary variables inserted
earlier, since they are no longer necessary.  Transform all of the
callers using the following semantic patch:

@@
expression E1;
@@
- parse_object(E1.hash)
+ parse_object(&E1)

@@
expression E1;
@@
- parse_object(E1->hash)
+ parse_object(E1)

@@
expression E1, E2;
@@
- parse_object_or_die(E1.hash, E2)
+ parse_object_or_die(&E1, E2)

@@
expression E1, E2;
@@
- parse_object_or_die(E1->hash, E2)
+ parse_object_or_die(E1, E2)

@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1.hash, E2, E3, E4, E5)
+ parse_object_buffer(&E1, E2, E3, E4, E5)

@@
expression E1, E2, E3, E4, E5;
@@
- parse_object_buffer(E1->hash, E2, E3, E4, E5)
+ parse_object_buffer(E1, E2, E3, E4, E5)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:58 +09:00
brian m. carlson 740ee055c6 Convert lookup_tree to struct object_id
Convert the lookup_tree function to take a pointer to struct object_id.

The commit was created with manual changes to tree.c, tree.h, and
object.c, plus the following semantic patch:

@@
@@
- lookup_tree(EMPTY_TREE_SHA1_BIN)
+ lookup_tree(&empty_tree_oid)

@@
expression E1;
@@
- lookup_tree(E1.hash)
+ lookup_tree(&E1)

@@
expression E1;
@@
- lookup_tree(E1->hash)
+ lookup_tree(E1)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
brian m. carlson 3aca1fc6c9 Convert lookup_blob to struct object_id
Convert lookup_blob to take a pointer to struct object_id.

The commit was created with manual changes to blob.c and blob.h, plus
the following semantic patch:

@@
expression E1;
@@
- lookup_blob(E1.hash)
+ lookup_blob(&E1)

@@
expression E1;
@@
- lookup_blob(E1->hash)
+ lookup_blob(E1)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
Johannes Schindelin 1aeb7e756c parse_timestamp(): specify explicitly where we parse timestamps
Currently, Git's source code represents all timestamps as `unsigned
long`. In preparation for using a more appropriate data type, let's
introduce a symbol `parse_timestamp` (currently being defined to
`strtoul`) where appropriate, so that we can later easily switch to,
say, use `strtoull()` instead.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-23 20:19:15 -07:00
brian m. carlson 910650d2f8 Rename sha1_array to oid_array
Since this structure handles an array of object IDs, rename it to struct
oid_array.  Also rename the accessor functions and the initialization
constant.

This commit was produced mechanically by providing non-Documentation
files to the following Perl one-liners:

    perl -pi -E 's/struct sha1_array/struct oid_array/g'
    perl -pi -E 's/\bsha1_array_/oid_array_/g'
    perl -pi -E 's/SHA1_ARRAY_INIT/OID_ARRAY_INIT/g'

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31 08:33:56 -07:00
brian m. carlson 5d3206d501 Convert sha1_array_lookup to take struct object_id
Convert this function by changing the declaration and definition and
applying the following semantic patch to update the callers:

@@
expression E1, E2;
@@
- sha1_array_lookup(E1, E2.hash)
+ sha1_array_lookup(E1, &E2)

@@
expression E1, E2;
@@
- sha1_array_lookup(E1, E2->hash)
+ sha1_array_lookup(E1, E2)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31 08:33:55 -07:00
brian m. carlson 98a72ddc12 Make sha1_array_append take a struct object_id *
Convert the callers to pass struct object_id by changing the function
declaration and definition and applying the following semantic patch:

@@
expression E1, E2;
@@
- sha1_array_append(E1, E2.hash)
+ sha1_array_append(E1, &E2)

@@
expression E1, E2;
@@
- sha1_array_append(E1, E2->hash)
+ sha1_array_append(E1, E2)

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-31 08:33:55 -07:00
brian m. carlson ee3051bd23 sha1-array: convert internal storage for struct sha1_array to object_id
Make the internal storage for struct sha1_array use an array of struct
object_id internally.  Update the users of this struct which inspect its
internals.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-28 09:59:34 -07:00
brian m. carlson 365c27fbff fsck: convert init_skiplist to struct object_id
Convert a hardcoded constant buffer size to a use of GIT_MAX_HEXSZ, and
use parse_oid_hex to reduce the dependency on the size of the hash.
This function is a caller of sha1_array_append, which will be converted
later.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-28 09:59:33 -07:00
Jeff King a2b22854bd fsck: lazily load types under --connectivity-only
The recent fixes to "fsck --connectivity-only" load all of
the objects with their correct types. This keeps the
connectivity-only code path close to the regular one, but it
also introduces some unnecessary inefficiency. While getting
the type of an object is cheap compared to actually opening
and parsing the object (as the non-connectivity-only case
would do), it's still not free.

For reachable non-blob objects, we end up having to parse
them later anyway (to see what they point to), making our
type lookup here redundant.

For unreachable objects, we might never hit them at all in
the reachability traversal, making the lookup completely
wasted. And in some cases, we might have quite a few
unreachable objects (e.g., when alternates are used for
shared object storage between repositories, it's normal for
there to be objects reachable from other repositories but
not the one running fsck).

The comment in mark_object_for_connectivity() claims two
benefits to getting the type up front:

  1. We need to know the types during fsck_walk(). (And not
     explicitly mentioned, but we also need them when
     printing the types of broken or dangling commits).

     We can address this by lazy-loading the types as
     necessary. Most objects never need this lazy-load at
     all, because they fall into one of these categories:

       a. Reachable from our tips, and are coerced into the
	  correct type as we traverse (e.g., a parent link
	  will call lookup_commit(), which converts OBJ_NONE
	  to OBJ_COMMIT).

       b. Unreachable, but not at the tip of a chunk of
          unreachable history. We only mention the tips as
	  "dangling", so an unreachable commit which links
	  to hundreds of other objects needs only report the
	  type of the tip commit.

  2. It serves as a cross-check that the coercion in (1a) is
     correct (i.e., we'll complain about a parent link that
     points to a blob). But we get most of this for free
     already, because right after coercing, we'll parse any
     non-blob objects. So we'd notice then if we expected a
     commit and got a blob.

     The one exception is when we expect a blob, in which
     case we never actually read the object contents.

     So this is a slight weakening, but given that the whole
     point of --connectivity-only is to sacrifice some data
     integrity checks for speed, this seems like an
     acceptable tradeoff.

Here are before and after timings for an extreme case with
~5M reachable objects and another ~12M unreachable (it's the
torvalds/linux repository on GitHub, connected to shared
storage for all of the other kernel forks):

  [before]
  $ time git fsck --no-dangling --connectivity-only
  real	3m4.323s
  user	1m25.121s
  sys	1m38.710s

  [after]
  $ time git fsck --no-dangling --connectivity-only
  real	0m51.497s
  user	0m49.575s
  sys	0m1.776s

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-01-26 10:51:09 -08:00
David Turner 8354fa3d4c fsck: handle bad trees like other errors
Instead of dying when fsck hits a malformed tree object, log the error
like any other and continue.  Now fsck can tell the user which tree is
bad, too.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-09-27 14:09:10 -07:00
Johannes Schindelin 90cf590f53 fsck: optionally show more helpful info for broken links
When reporting broken links between commits/trees/blobs, it would be
quite helpful at times if the user would be told how the object is
supposed to be reachable.

With the new --name-objects option, git-fsck will try to do exactly
that: name the objects in a way that shows how they are reachable.

For example, when some reflog got corrupted and a blob is missing that
should not be, the user might want to remove the corresponding reflog
entry. This option helps them find that entry: `git fsck` will now
report something like this:

	broken link from    tree b5eb6ff...  (refs/stash@{<date>}~37:)
	              to    blob ec5cf80...

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-18 15:15:59 -07:00
Johannes Schindelin 1cd772cc41 fsck: give the error function a chance to see the fsck_options
We will need this in the next commit, where fsck will be taught to
optionally name the objects when reporting issues about them.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-18 11:35:00 -07:00
Johannes Schindelin 7b35efd734 fsck_walk(): optionally name objects on the go
If fsck_options->name_objects is initialized, and if it already has
name(s) for the object(s) that are to be the starting point(s) for
fsck_walk(), then that function will now add names for the objects
that were walked.

This will be highly useful for teaching git-fsck to identify root causes
for broken links, which is the task for the next patch in this series.

Note that this patch opts for decorating the objects with plain strings
instead of full-blown structs (à la `struct rev_name` in the code of
the `git name-rev` command), for several reasons:

- the code is much simpler than if it had to work with structs that
  describe arbitrarily long names such as "master~14^2~5:builtin/am.c",

- the string processing is actually quite light-weight compared to the
  rest of fsck's operation,

- the caller of fsck_walk() is expected to provide names for the
  starting points, and using plain and simple strings is just the
  easiest way to do that.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-18 11:35:00 -07:00
Junio C Hamano 6bfb7de89e Merge branch 'jc/fsck-nul-in-commit'
"git fsck" learned to catch NUL byte in a commit object as
potential error and warn.

* jc/fsck-nul-in-commit:
  fsck: detect and warn a commit with embedded NUL
  fsck_commit_buffer(): do not special case the last validation
2016-05-17 14:38:34 -07:00
Junio C Hamano 6d2d780f63 fsck: detect and warn a commit with embedded NUL
Even though a Git commit object is designed to be capable of storing
any binary data as its payload, in practice people use it to describe
the changes in textual form, and tools like "git log" are designed to
treat the payload as text.

Detect and warn when we see any commit object with a NUL byte in
it.

Note that a NUL byte in the header part is already detected as a
grave error.  This change is purely about the message part.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-10 10:02:06 -07:00
brian m. carlson ce6663a9da tree-walk: convert tree_entry_extract() to use struct object_id
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25 14:26:28 -07:00
brian m. carlson 7d924c9139 struct name_entry: use struct object_id instead of unsigned char sha1[20]
Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-25 14:23:42 -07:00
Junio C Hamano 5af297185e fsck_commit_buffer(): do not special case the last validation
The pattern taken by all the validations in this function is:

	if (notice a violation exists) {
		err = report(... VIOLATION_KIND ...);
		if (err)
			return err;
	}

where report() returns zero if specified kind of violation is set to
be ignored, and otherwise shows an error message and returns non-zero.

The last validation in the function immediately before the function
returns 0 to declare "all good" can cheat and directly return the
return value from report(), and the current code does so, i.e.

	if (notice a violation exists)
		return report(... VIOLATION_KIND ...);
	return 0;

But that is a selfish code that declares it is the ultimate and
final form of the function, never to be enhanced later.  To allow
and invite future enhancements, make the last test follow the same
pattern.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-14 11:15:48 -07:00
Jeff King b32fa95fd8 convert trivial cases to ALLOC_ARRAY
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:

  1. It automatically checks the array-size multiplication
     for overflow.

  2. It always uses sizeof(*array) for the element-size,
     so that it can never go out of sync with the declared
     type of the array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Junio C Hamano 844a9ce472 Merge branch 'bc/object-id'
More transition from "unsigned char[40]" to "struct object_id".

This needed a few merge fixups, but is mostly disentangled from other
topics.

* bc/object-id:
  remote: convert functions to struct object_id
  Remove get_object_hash.
  Convert struct object to object_id
  Add several uses of get_object_hash.
  object: introduce get_object_hash macro.
  ref_newer: convert to use struct object_id
  push_refs_with_export: convert to struct object_id
  get_remote_heads: convert to struct object_id
  parse_fetch: convert to use struct object_id
  add_sought_entry_mem: convert to struct object_id
  Convert struct ref to use object_id.
  sha1_file: introduce has_object_file helper.
2015-12-10 12:36:13 -08:00
René Scharfe 8a272f291a fsck: treat a NUL in a tag header as an error
We check the return value of verify_header() for commits already, so do
the same for tags as well.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:07 -05:00
brian m. carlson ed1c9977cb Remove get_object_hash.
Convert all instances of get_object_hash to use an appropriate reference
to the hash member of the oid member of struct object.  This provides no
functional change, as it is essentially a macro substitution.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson f2fd0760f6 Convert struct object to object_id
struct object is one of the major data structures dealing with object
IDs.  Convert it to use struct object_id instead of an unsigned char
array.  Convert get_object_hash to refer to the new member as well.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
brian m. carlson 7999b2cf77 Add several uses of get_object_hash.
Convert most instances where the sha1 member of struct object is
dereferenced to use get_object_hash.  Most instances that are passed to
functions that have versions taking struct object_id, such as
get_sha1_hex/get_oid_hex, or instances that can be trivially converted
to use struct object_id instead, are not converted.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Junio C Hamano b2f44feba5 Merge branch 'js/fsck-opt'
Allow ignoring fsck errors on specific set of known-to-be-bad
objects, and also tweaking warning level of various kinds of non
critical breakages reported.

* js/fsck-opt:
  fsck: support ignoring objects in `git fsck` via fsck.skiplist
  fsck: git receive-pack: support excluding objects from fsck'ing
  fsck: introduce `git fsck --connectivity-only`
  fsck: support demoting errors to warnings
  fsck: document the new receive.fsck.<msg-id> options
  fsck: allow upgrading fsck warnings to errors
  fsck: optionally ignore specific fsck issues completely
  fsck: disallow demoting grave fsck errors to warnings
  fsck: add a simple test for receive.fsck.<msg-id>
  fsck: make fsck_tag() warn-friendly
  fsck: handle multiple authors in commits specially
  fsck: make fsck_commit() warn-friendly
  fsck: make fsck_ident() warn-friendly
  fsck: report the ID of the error/warning
  fsck (receive-pack): allow demoting errors to warnings
  fsck: offer a function to demote fsck errors to warnings
  fsck: provide a function to parse fsck message IDs
  fsck: introduce identifiers for fsck messages
  fsck: introduce fsck options
2015-08-03 11:01:18 -07:00
Junio C Hamano acf7189512 Merge branch 'jc/fsck-retire-require-eoh'
A fix to a minor regression to "git fsck" in v2.2 era that started
complaining about a body-less tag object when it lacks a separator
empty line after its header to separate it with a non-existent body.

* jc/fsck-retire-require-eoh:
  fsck: it is OK for a tag and a commit to lack the body
2015-07-13 14:00:24 -07:00
Junio C Hamano 84d18c0bcf fsck: it is OK for a tag and a commit to lack the body
When fsck validates a commit or a tag, it scans each line in the
header of the object using helper functions such as "start_with()",
etc. that work on a NUL terminated buffer, but before a1e920a0
(index-pack: terminate object buffers with NUL, 2014-12-08), the
validation functions were fed the object data in a piece of memory
that is not necessarily terminated with a NUL.

We added a helper function require_end_of_header() to be called at
the beginning of these validation functions to insist that the
object data contains an empty line before its end.  The theory is
that the validating functions will notice and stop when it hits an
empty line as a normal end of header (or a required header line that
is missing) without scanning past the end of potentially not
NUL-terminated buffer.

But the theory forgot that in the older days, Git itself happily
created objects with only the header lines without a body. This
caused Git 2.2 and later to issue an unnecessary warning in some
existing repositories.

With a1e920a0, we do not need to require an empty line (or the body)
in these objects to safely parse and validate them.  Drop the
offending "must have an empty line" check from this helper function,
while keeping the other check to make sure that there is no NUL in
the header part of the object, and adjust the name of the helper to
what it does accordingly.

Noticed-by: Wolfgang Denk <wd@denx.de>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-28 14:47:28 -07:00
Johannes Schindelin cd94c6f91e fsck: git receive-pack: support excluding objects from fsck'ing
The optional new config option `receive.fsck.skipList` specifies the path
to a file listing the names, i.e. SHA-1s, one per line, of objects that
are to be ignored by `git receive-pack` when `receive.fsckObjects = true`.

This is extremely handy in case of legacy repositories where it would
cause more pain to change incorrect objects than to live with them
(e.g. a duplicate 'author' line in an early commit object).

The intended use case is for server administrators to inspect objects
that are reported by `git push` as being too problematic to enter the
repository, and to add the objects' SHA-1 to a (preferably sorted) file
when the objects are legitimate, i.e. when it is determined that those
problematic objects should be allowed to enter the server.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:37 -07:00
Johannes Schindelin f27d05b170 fsck: allow upgrading fsck warnings to errors
The 'invalid tag name' and 'missing tagger entry' warnings can now be
upgraded to errors by specifying `invalidTagName` and
`missingTaggerEntry` in the receive.fsck.<msg-id> config setting.

Incidentally, the missing tagger warning is now really shown as a warning
(as opposed to being reported with the "error:" prefix, as it used to be
the case before this commit).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:36 -07:00
Johannes Schindelin efaba7cc77 fsck: optionally ignore specific fsck issues completely
An fsck issue in a legacy repository might be so common that one would
like not to bother the user with mentioning it at all. With this change,
that is possible by setting the respective message type to "ignore".

This change "abuses" the missingEmail=warn test to verify that "ignore"
is also accepted and works correctly. And while at it, it makes sure
that multiple options work, too (they are passed to unpack-objects or
index-pack as a comma-separated list via the --strict=... command-line
option).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:36 -07:00
Johannes Schindelin f50c440730 fsck: disallow demoting grave fsck errors to warnings
Some kinds of errors are intrinsically unrecoverable (e.g. errors while
uncompressing objects). It does not make sense to allow demoting them to
mere warnings.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:36 -07:00
Johannes Schindelin 7d7d5b0568 fsck: make fsck_tag() warn-friendly
When fsck_tag() identifies a problem with the commit, it should try
to make it possible to continue checking the commit object, in case the
user wants to demote the detected errors to mere warnings.

Just like fsck_commit(), there are certain problems that could hide other
issues with the same tag object. For example, if the 'type' line is not
encountered in the correct position, the 'tag' line – if there is any –
would not be handled at all.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:35 -07:00
Johannes Schindelin c9ad147f83 fsck: handle multiple authors in commits specially
This problem has been detected in the wild, and is the primary reason
to introduce an option to demote certain fsck errors to warnings. Let's
offer to ignore this particular problem specifically.

Technically, we could handle such repositories by setting
receive.fsck.<msg-id> to missingCommitter=warn, but that could hide
missing tree objects in the same commit because we cannot continue
verifying any commit object after encountering a missing committer line,
while we can continue in the case of multiple author lines.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:35 -07:00
Johannes Schindelin b3584761eb fsck: make fsck_commit() warn-friendly
When fsck_commit() identifies a problem with the commit, it should try
to make it possible to continue checking the commit object, in case the
user wants to demote the detected errors to mere warnings.

Note that some problems are too problematic to simply ignore. For
example, when the header lines are mixed up, we punt after encountering
an incorrect line. Therefore, demoting certain warnings to errors can
hide other problems. Example: demoting the missingauthor error to
a warning would hide a problematic committer line.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:35 -07:00
Johannes Schindelin e6826e335a fsck: make fsck_ident() warn-friendly
When fsck_ident() identifies a problem with the ident, it should still
advance the pointer to the next line so that fsck can continue in the
case of a mere warning.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:35 -07:00
Johannes Schindelin 71ab8fa840 fsck: report the ID of the error/warning
Some repositories written by legacy code have objects with non-fatal
fsck issues. To allow the user to ignore those issues, let's print
out the ID (e.g. when encountering "missingEmail", the user might
want to call `git config --add receive.fsck.missingEmail=warn`).

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:35 -07:00
Johannes Schindelin 5d477a334a fsck (receive-pack): allow demoting errors to warnings
For example, missing emails in commit and tag objects can be demoted to
mere warnings with

	git config receive.fsck.missingemail=warn

The value is actually a comma-separated list.

In case that the same key is listed in multiple receive.fsck.<msg-id>
lines in the config, the latter configuration wins (this can happen for
example when both $HOME/.gitconfig and .git/config contain message type
settings).

As git receive-pack does not actually perform the checks, it hands off
the setting to index-pack or unpack-objects in the form of an optional
argument to the --strict option.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:27:34 -07:00
Johannes Schindelin 0282f4dced fsck: offer a function to demote fsck errors to warnings
There are legacy repositories out there whose older commits and tags
have issues that prevent pushing them when 'receive.fsckObjects' is set.
One real-life example is a commit object that has been hand-crafted to
list two authors.

Often, it is not possible to fix those issues without disrupting the
work with said repositories, yet it is still desirable to perform checks
by setting `receive.fsckObjects = true`. This commit is the first step
to allow demoting specific fsck issues to mere warnings.

The `fsck_set_msg_types()` function added by this commit parses a list
of settings in the form:

	missingemail=warn,badname=warn,...

Unfortunately, the FSCK_WARN/FSCK_ERROR flag is only really heeded by
git fsck so far, but other call paths (e.g. git index-pack --strict)
error out *always* no matter what type was specified. Therefore, we need
to take extra care to set all message types to FSCK_ERROR by default in
those cases.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-23 14:26:46 -07:00
Johannes Schindelin f417eed8cd fsck: provide a function to parse fsck message IDs
These functions will be used in the next commits to allow the user to
ask fsck to handle specific problems differently, e.g. demoting certain
errors to warnings. The upcoming `fsck_set_msg_types()` function has to
handle partial strings because we would like to be able to parse, say,
'missingemail=warn,missingtaggerentry=warn' command line parameters
(which will be passed by receive-pack to index-pack and unpack-objects).

To make the parsing robust, we generate strings from the enum keys, and
using these keys, we match up strings without dashes case-insensitively
to the corresponding enum values.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22 12:53:25 -07:00
Johannes Schindelin c99ba492f1 fsck: introduce identifiers for fsck messages
Instead of specifying whether a message by the fsck machinery constitutes
an error or a warning, let's specify an identifier relating to the
concrete problem that was encountered. This is necessary for upcoming
support to be able to demote certain errors to warnings.

In the process, simplify the requirements on the calling code: instead of
having to handle full-blown varargs in every callback, we now send a
string buffer ready to be used by the callback.

We could use a simple enum for the message IDs here, but we want to
guarantee that the enum values are associated with the appropriate
message types (i.e. error or warning?). Besides, we want to introduce a
parser in the next commit that maps the string representation to the
enum value, hence we use the slightly ugly preprocessor construct that
is extensible for use with said parser.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22 10:24:27 -07:00
Johannes Schindelin 22410549fc fsck: introduce fsck options
Just like the diff machinery, we are about to introduce more settings,
therefore it makes sense to carry them around as a (pointer to a) struct
containing all of them.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-22 10:23:32 -07:00
Junio C Hamano 1cb4b3d380 Merge branch 'js/fsck-tag-validation'
New tag object format validation added in 2.2 showed garbage
after a tagname it reported in its error message.

* js/fsck-tag-validation:
  index-pack: terminate object buffers with NUL
  fsck: properly bound "invalid tag name" error message
2014-12-22 12:27:41 -08:00
Junio C Hamano 77933f4449 Sync with v2.1.4
* maint-2.1:
  Git 2.1.4
  Git 2.0.5
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:46:57 -08:00
Junio C Hamano 58f1d950e3 Sync with v2.0.5
* maint-2.0:
  Git 2.0.5
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:42:28 -08:00
Junio C Hamano 5e519fb8b0 Sync with v1.9.5
* maint-1.9:
  Git 1.9.5
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:28:54 -08:00
Junio C Hamano 6898b79721 Sync with v1.8.5.6
* maint-1.8.5:
  Git 1.8.5.6
  fsck: complain about NTFS ".git" aliases in trees
  read-cache: optionally disallow NTFS .git variants
  path: add is_ntfs_dotgit() helper
  fsck: complain about HFS+ ".git" aliases in trees
  read-cache: optionally disallow HFS+ .git variants
  utf8: add is_hfs_dotgit() helper
  fsck: notice .git case-insensitively
  t1450: refactor ".", "..", and ".git" fsck tests
  verify_dotfile(): reject .git case-insensitively
  read-tree: add tests for confusing paths like ".." and ".git"
  unpack-trees: propagate errors adding entries to the index
2014-12-17 11:20:31 -08:00
Johannes Schindelin d08c13b947 fsck: complain about NTFS ".git" aliases in trees
Now that the index can block pathnames that can be mistaken
to mean ".git" on NTFS and FAT32, it would be helpful for
fsck to notice such problematic paths. This lets servers
which use receive.fsckObjects block them before the damage
spreads.

Note that the fsck check is always on, even for systems
without core.protectNTFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
NTFS.

However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on NTFS themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git or git~1, meaning mischief is almost
certainly what the tree author had in mind).

Ideally these would be controlled by a separate
"fsck.protectNTFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:45 -08:00
Jeff King a18fcc9ff2 fsck: complain about HFS+ ".git" aliases in trees
Now that the index can block pathnames that case-fold to
".git" on HFS+, it would be helpful for fsck to notice such
problematic paths. This lets servers which use
receive.fsckObjects block them before the damage spreads.

Note that the fsck check is always on, even for systems
without core.protectHFS set. This is technically more
restrictive than we need to be, as a set of users on ext4
could happily use these odd filenames without caring about
HFS+.

However, on balance, it's helpful for all servers to block
these (because the paths can be used for mischief, and
servers which bother to fsck would want to stop the spread
whether they are on HFS+ themselves or not), and hardly
anybody will be affected (because the blocked names are
variants of .git with invisible Unicode code-points mixed
in, meaning mischief is almost certainly what the tree
author had in mind).

Ideally these would be controlled by a separate
"fsck.protectHFS" flag. However, it would be much nicer to
be able to enable/disable _any_ fsck flag individually, and
any scheme we choose should match such a system. Given the
likelihood of anybody using such a path in practice, it is
not unreasonable to wait until such a system materializes.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:45 -08:00
Jeff King 76e86fc6e3 fsck: notice .git case-insensitively
We complain about ".git" in a tree because it cannot be
loaded into the index or checked out. Since we now also
reject ".GIT" case-insensitively, fsck should notice the
same, so that errors do not propagate.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-17 11:04:39 -08:00
Jeff King 7add441984 fsck: properly bound "invalid tag name" error message
When we detect an invalid tag-name header in a tag object,
like, "tag foo bar\n", we feed the pointer starting at "foo
bar" to a printf "%s" formatter. This shows the name, as we
want, but then it keeps printing the rest of the tag buffer,
rather than stopping at the end of the line.

Our tests did not notice because they look only for the
matching line, but the bug is that we print much more than
we wanted to. So we also adjust the test to be more exact.

Note that when fscking tags with "index-pack --strict", this
is even worse. index-pack does not add a trailing
NUL-terminator after the object, so we may actually read
past the buffer and print uninitialized memory. Running
t5302 with valgrind does notice the bug for that reason.

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-09 11:54:25 -08:00
Johannes Schindelin cec097be3a fsck: check tag objects' headers
We inspect commit objects pretty much in detail in git-fsck, but we just
glanced over the tag objects. Let's be stricter.

Since we do not want to limit 'tag' lines unduly, values that would fail
the refname check only result in warnings, not errors.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-11 10:44:26 -07:00
Johannes Schindelin 4d0d89755e Make sure fsck_commit_buffer() does not run out of the buffer
So far, we assumed that the buffer is NUL terminated, but this is not
a safe assumption, now that we opened the fsck_object() API to pass a
buffer directly.

So let's make sure that there is at least an empty line in the buffer.
That way, our checks would fail if the empty line was encountered
prematurely, and consequently we can get away with the current string
comparisons even with non-NUL-terminated buffers are passed to
fsck_object().

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-11 10:44:01 -07:00
Johannes Schindelin 90a398bbd7 fsck_object(): allow passing object data separately from the object itself
When fsck'ing an incoming pack, we need to fsck objects that cannot be
read via read_sha1_file() because they are not local yet (and might even
be rejected if transfer.fsckobjects is set to 'true').

For commits, there is a hack in place: we basically cache commit
objects' buffers anyway, but the same is not true, say, for tag objects.

By refactoring fsck_object() to take the object buffer and size as
optional arguments -- optional, because we still fall back to the
previous method to look at the cached commit objects if the caller
passes NULL -- we prepare the machinery for the upcoming handling of tag
objects.

The assumption that such buffers are inherently NUL terminated is now
wrong, of course, hence we pass the size of the buffer so that we can
add a sanity check later, to prevent running past the end of the buffer.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-10 13:54:21 -07:00
René Scharfe 9d02150cf4 fsck: simplify fsck_commit_buffer() by using commit_list_count()
fsck_commit_buffer() checks that the number of items in the parents
list of a commit matches the number of parent lines in its buffer or --
if a graft is used -- the number of parents in that graft.  Simplify
the code by using commit_list_count() instead of counting by hand.
Also use different variables for the number of lines and the number of
list items, making it easier to compare them.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-10 14:10:27 -07:00
Junio C Hamano e91ae32a01 Merge branch 'jk/skip-prefix'
* jk/skip-prefix:
  http-push: refactor parsing of remote object names
  imap-send: use skip_prefix instead of using magic numbers
  use skip_prefix to avoid repeated calculations
  git: avoid magic number with skip_prefix
  fetch-pack: refactor parsing in get_ack
  fast-import: refactor parsing of spaces
  stat_opt: check extra strlen call
  daemon: use skip_prefix to avoid magic numbers
  fast-import: use skip_prefix for parsing input
  use skip_prefix to avoid repeating strings
  use skip_prefix to avoid magic numbers
  transport-helper: avoid reading past end-of-string
  fast-import: fix read of uninitialized argv memory
  apply: use skip_prefix instead of raw addition
  refactor skip_prefix to return a boolean
  avoid using skip_prefix as a boolean
  daemon: mark some strings as const
  parse_diff_color_slot: drop ofs parameter
2014-07-09 11:33:28 -07:00
Jeff King cf4fff579e refactor skip_prefix to return a boolean
The skip_prefix() function returns a pointer to the content
past the prefix, or NULL if the prefix was not found. While
this is nice and simple, in practice it makes it hard to use
for two reasons:

  1. When you want to conditionally skip or keep the string
     as-is, you have to introduce a temporary variable.
     For example:

       tmp = skip_prefix(buf, "foo");
       if (tmp)
	       buf = tmp;

  2. It is verbose to check the outcome in a conditional, as
     you need extra parentheses to silence compiler
     warnings. For example:

       if ((cp = skip_prefix(buf, "foo"))
	       /* do something with cp */

Both of these make it harder to use for long if-chains, and
we tend to use starts_with() instead. However, the first line
of "do something" is often to then skip forward in buf past
the prefix, either using a magic constant or with an extra
strlen(3) (which is generally computed at compile time, but
means we are repeating ourselves).

This patch refactors skip_prefix() to return a simple boolean,
and to provide the pointer value as an out-parameter. If the
prefix is not found, the out-parameter is untouched. This
lets you write:

  if (skip_prefix(arg, "foo ", &arg))
	  do_foo(arg);
  else if (skip_prefix(arg, "bar ", &arg))
	  do_bar(arg);

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-20 10:44:43 -07:00
Jeff King 8597ea3afe commit: record buffer length in cache
Most callsites which use the commit buffer try to use the
cached version attached to the commit, rather than
re-reading from disk. Unfortunately, that interface provides
only a pointer to the NUL-terminated buffer, with no
indication of the original length.

For the most part, this doesn't matter. People do not put
NULs in their commit messages, and the log code is happy to
treat it all as a NUL-terminated string. However, some code
paths do care. For example, when checking signatures, we
want to be very careful that we verify all the bytes to
avoid malicious trickery.

This patch just adds an optional "size" out-pointer to
get_commit_buffer and friends. The existing callers all pass
NULL (there did not seem to be any obvious sites where we
could avoid an immediate strlen() call, though perhaps with
some further refactoring we could).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:09:38 -07:00
Jeff King bc6b8fc130 use get_commit_buffer everywhere
Each of these sites assumes that commit->buffer is valid.
Since they would segfault if this was not the case, they are
likely to be correct in practice. However, we can
future-proof them by using get_commit_buffer.

And as a side effect, we abstract away the final bare uses
of commit->buffer.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-13 12:08:17 -07:00
Junio C Hamano f7804e250d Merge branch 'hs/simplify-bit-setting-in-fsck-tree'
* hs/simplify-bit-setting-in-fsck-tree:
  fsck: use bitwise-or assignment operator to set flag
2014-03-31 16:30:44 -07:00
Junio C Hamano 40adf520a3 Merge branch 'ys/fsck-commit-parsing'
* ys/fsck-commit-parsing:
  fsck.c:fsck_commit(): use skip_prefix() to verify and skip constant
  fsck.c:fsck_ident(): ident points at a const string
2014-03-28 13:51:24 -07:00
Hiroyuki Sano effd12ec87 fsck: use bitwise-or assignment operator to set flag
fsck_tree() has two different ways to set a flag variable, either by
using a if-statement that guards an assignment, or by using a
bitwise-or assignment operator.  Most are done with the former, and
only one variable is assigned with the latter.

Since all the conditions are short-and-sweet, we can afford to
uniformly use the latter style, which makes the resulting code
shorter and easier to read.

Signed-off-by: Hiroyuki Sano <sh19910711@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-20 11:20:48 -07:00
Yuxuan Shui 2d820a61df fsck.c:fsck_commit(): use skip_prefix() to verify and skip constant
fsck_commit() uses memcmp() to check if the buffer starts with a
certain prefix, and skips the prefix if it does.

This is exactly what skip_prefix() was designed for.

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-19 15:34:56 -07:00
Yuxuan Shui de42180f6a fsck.c:fsck_ident(): ident points at a const string
Since fsck_ident doesn't change the content of **ident, the type of
ident could be const char **.

This change is required to rewrite fsck_commit() to use skip_prefix().

Signed-off-by: Yuxuan Shui <yshuiv7@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-13 13:00:30 -07:00
Jeff King 7ca36d9398 date: check date overflow against time_t
When we check whether a timestamp has overflowed, we check
only against ULONG_MAX, meaning that strtoul has overflowed.
However, we also feed these timestamps to system functions
like gmtime, which expect a time_t. On many systems, time_t
is actually smaller than "unsigned long" (e.g., because it
is signed), and we would overflow when using these
functions.  We don't know the actual size or signedness of
time_t, but we can easily check for truncation with a simple
assignment.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 10:12:58 -08:00
Jeff King d4b8de0420 fsck: report integer overflow in author timestamps
When we check commit objects, we complain if commit->date is
ULONG_MAX, which is an indication that we saw integer
overflow when parsing it. However, we do not do any check at
all for author lines, which also contain a timestamp.

Let's actually check the timestamps on each ident line
with strtoul. This catches both author and committer lines,
and we can get rid of the now-redundant commit->date check.

Note that like the existing check, we compare only against
ULONG_MAX. Now that we are calling strtoul at the site of
the check, we could be slightly more careful and also check
that errno is set to ERANGE. However, this will make further
refactoring in future patches a little harder, and it
doesn't really matter in practice.

For 32-bit systems, one would have to create a commit at the
exact wrong second in 2038. But by the time we get close to
that, all systems will hopefully have moved to 64-bit (and
if they haven't, they have a real problem one second later).

For 64-bit systems, by the time we get close to ULONG_MAX,
all systems will hopefully have been consumed in the fiery
wrath of our expanding Sun.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 10:12:58 -08:00
Jeff King 5c17f51270 fsck: warn about ".git" in trees
Having a ".git" entry inside a tree can cause confusing
results on checkout. At the top-level, you could not
checkout such a tree, as it would complain about overwriting
the real ".git" directory. In a subdirectory, you might
check it out, but performing operations in the subdirectory
would confusingly consider the in-tree ".git" directory as
the repository.

The regular git tools already make it hard to accidentally
add such an entry to a tree, and do not allow such entries
to enter the index at all. Teaching fsck about it provides
an additional safety check, and let's us avoid propagating
any such bogosity when transfer.fsckObjects is on.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-28 13:52:54 -08:00
Jeff King 5d34a4359d fsck: warn about '.' and '..' in trees
A tree with meta-paths like '.' or '..' does not work well
with git; the index will refuse to load it or check it out
to the filesystem (and even if we did not have that safety,
it would look like we were overwriting an untracked
directory). For the same reason, it is difficult to create
such a tree with regular git.

Let's warn about these dubious entries during fsck, just in
case somebody has created a bogus tree (and this also lets
us prevent them from propagating when transfer.fsckObjects
is set).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-11-28 10:41:08 -08:00
Junio C Hamano 03adeeaad6 Merge branch 'jk/maint-null-in-trees' into maint-1.7.11
"git diff" had a confusion between taking data from a path in the
working tree and taking data from an object that happens to have
name 0{40} recorded in a tree.

* jk/maint-null-in-trees:
  fsck: detect null sha1 in tree entries
  do not write null sha1s to on-disk index
  diff: do not use null sha1 as a sentinel value
2012-09-10 15:24:54 -07:00
Jeff King c479d14a80 fsck: detect null sha1 in tree entries
Short of somebody happening to beat the 1 in 2^160 odds of
actually generating content that hashes to the null sha1, we
should never see this value in a tree entry. So let's have
fsck warn if it it seen.

As in the previous commit, we test both blob and submodule
entries to future-proof the test suite against the
implementation depending on connectivity to notice the
error.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-07-29 15:14:08 -07:00