1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-18 21:46:26 +02:00
Commit Graph

174 Commits

Author SHA1 Message Date
Patrick Steinhardt 2455720950 attr: fix integer overflow when parsing huge attribute names
It is possible to trigger an integer overflow when parsing attribute
names that are longer than 2^31 bytes because we assign the result of
strlen(3P) to an `int` instead of to a `size_t`. This can lead to an
abort in vsnprintf(3P) with the following reproducer:

    blob=$(perl -e 'print "A " . "B"x2147483648 . "\n"' | git hash-object -w --stdin)
    git update-index --add --cacheinfo 100644,$blob,.gitattributes
    git check-attr --all path

    BUG: strbuf.c:400: your vsnprintf is broken (returned -1)

But furthermore, assuming that the attribute name is even longer than
that, it can cause us to silently truncate the attribute and thus lead
to wrong results.

Fix this integer overflow by using a `size_t` instead. This fixes the
silent truncation of attribute names, but it only partially fixes the
BUG we hit: even though the initial BUG is fixed, we can still hit a BUG
when parsing invalid attribute lines via `report_invalid_attr()`.

This is due to an underlying design issue in vsnprintf(3P) which only
knows to return an `int`, and thus it may always overflow with large
inputs. This issue is benign though: the worst that can happen is that
the error message is misreported to be either truncated or too long, but
due to the buffer being NUL terminated we wouldn't ever do an
out-of-bounds read here.

Reported-by: Markus Vervier <markus.vervier@x41-dsec.de>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 15:14:16 +09:00
Patrick Steinhardt 8d0d48cf21 attr: fix out-of-bounds read with huge attribute names
There is an out-of-bounds read possible when parsing gitattributes that
have an attribute that is 2^31+1 bytes long. This is caused due to an
integer overflow when we assign the result of strlen(3P) to an `int`,
where we use the wrapped-around value in a subsequent call to
memcpy(3P). The following code reproduces the issue:

    blob=$(perl -e 'print "a" x 2147483649 . " attr"' | git hash-object -w --stdin)
    git update-index --add --cacheinfo 100644,$blob,.gitattributes
    git check-attr --all file

    AddressSanitizer:DEADLYSIGNAL
    =================================================================
    ==8451==ERROR: AddressSanitizer: SEGV on unknown address 0x7f93efa00800 (pc 0x7f94f1f8f082 bp 0x7ffddb59b3a0 sp 0x7ffddb59ab28 T0)
    ==8451==The signal is caused by a READ memory access.
        #0 0x7f94f1f8f082  (/usr/lib/libc.so.6+0x176082)
        #1 0x7f94f2047d9c in __interceptor_strspn /usr/src/debug/gcc/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:752
        #2 0x560e190f7f26 in parse_attr_line attr.c:375
        #3 0x560e190f9663 in handle_attr_line attr.c:660
        #4 0x560e190f9ddd in read_attr_from_index attr.c:769
        #5 0x560e190f9f14 in read_attr attr.c:797
        #6 0x560e190fa24e in bootstrap_attr_stack attr.c:867
        #7 0x560e190fa4a5 in prepare_attr_stack attr.c:902
        #8 0x560e190fb5dc in collect_some_attrs attr.c:1097
        #9 0x560e190fb93f in git_all_attrs attr.c:1128
        #10 0x560e18e6136e in check_attr builtin/check-attr.c:67
        #11 0x560e18e61c12 in cmd_check_attr builtin/check-attr.c:183
        #12 0x560e18e15993 in run_builtin git.c:466
        #13 0x560e18e16397 in handle_builtin git.c:721
        #14 0x560e18e16b2b in run_argv git.c:788
        #15 0x560e18e17991 in cmd_main git.c:926
        #16 0x560e190ae2bd in main common-main.c:57
        #17 0x7f94f1e3c28f  (/usr/lib/libc.so.6+0x2328f)
        #18 0x7f94f1e3c349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
        #19 0x560e18e110e4 in _start ../sysdeps/x86_64/start.S:115

    AddressSanitizer can not provide additional info.
    SUMMARY: AddressSanitizer: SEGV (/usr/lib/libc.so.6+0x176082)
    ==8451==ABORTING

Fix this bug by converting the variable to a `size_t` instead.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 15:14:16 +09:00
Patrick Steinhardt eb22e7dfa2 attr: fix overflow when upserting attribute with overly long name
The function `git_attr_internal()` is called to upsert attributes into
the global map. And while all callers pass a `size_t`, the function
itself accepts an `int` as the attribute name's length. This can lead to
an integer overflow in case the attribute name is longer than `INT_MAX`.

Now this overflow seems harmless as the first thing we do is to call
`attr_name_valid()`, and that function only succeeds in case all chars
in the range of `namelen` match a certain small set of chars. We thus
can't do an out-of-bounds read as NUL is not part of that set and all
strings passed to this function are NUL-terminated. And furthermore, we
wouldn't ever read past the current attribute name anyway due to the
same reason. And if validation fails we will return early.

On the other hand it feels fragile to rely on this behaviour, even more
so given that we pass `namelen` to `FLEX_ALLOC_MEM()`. So let's instead
just do the correct thing here and accept a `size_t` as line length.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-12-05 15:14:16 +09:00
Elijah Newren b19315d8ab Use new HASHMAP_INIT macro to simplify hashmap initialization
Now that hashamp has lazy initialization and a HASHMAP_INIT macro,
hashmaps allocated on the stack can be initialized without a call to
hashmap_init() and in some cases makes the code a bit shorter.  Convert
some callsites over to take advantage of this.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-11 12:55:27 -08:00
Heba Waly 3a1b3415d9 attr: move doc to attr.h
Move the documentation from Documentation/technical/api-gitattributes.txt
to attr.h as it's easier for the developers to find the usage
information beside the code instead of looking for it in another doc file.

Also documentation/technical/api-gitattributes.txt is removed because the
information it has is now redundant and it'll be hard to keep it up to
date and synchronized with the documentation in the header file.

Signed-off-by: Heba Waly <heba.waly@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-18 15:21:28 +09:00
Junio C Hamano 5efabc7ed9 Merge branch 'ew/hashmap'
Code clean-up of the hashmap API, both users and implementation.

* ew/hashmap:
  hashmap_entry: remove first member requirement from docs
  hashmap: remove type arg from hashmap_{get,put,remove}_entry
  OFFSETOF_VAR macro to simplify hashmap iterators
  hashmap: introduce hashmap_free_entries
  hashmap: hashmap_{put,remove} return hashmap_entry *
  hashmap: use *_entry APIs for iteration
  hashmap_cmp_fn takes hashmap_entry params
  hashmap_get{,_from_hash} return "struct hashmap_entry *"
  hashmap: use *_entry APIs to wrap container_of
  hashmap_get_next returns "struct hashmap_entry *"
  introduce container_of macro
  hashmap_put takes "struct hashmap_entry *"
  hashmap_remove takes "const struct hashmap_entry *"
  hashmap_get takes "const struct hashmap_entry *"
  hashmap_add takes "struct hashmap_entry *"
  hashmap_get_next takes "const struct hashmap_entry *"
  hashmap_entry_init takes "struct hashmap_entry *"
  packfile: use hashmap_entry in delta_base_cache_entry
  coccicheck: detect hashmap_entry.hash assignment
  diff: use hashmap_entry_init on moved_entry.ent
2019-10-15 13:48:02 +09:00
Eric Wong e2b5038d87 hashmap_entry: remove first member requirement from docs
Comments stating that "struct hashmap_entry" must be the first
member in a struct are no longer valid.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:12 +09:00
Eric Wong 404ab78e39 hashmap: remove type arg from hashmap_{get,put,remove}_entry
Since these macros already take a `keyvar' pointer of a known type,
we can rely on OFFSETOF_VAR to get the correct offset without
relying on non-portable `__typeof__' and `offsetof'.

Argument order is also rearranged, so `keyvar' and `member' are
sequential as they are used as: `keyvar->member'

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:12 +09:00
Eric Wong 23dee69f53 OFFSETOF_VAR macro to simplify hashmap iterators
While we cannot rely on a `__typeof__' operator being portable
to use with `offsetof'; we can calculate the pointer offset
using an existing pointer and the address of a member using
pointer arithmetic for compilers without `__typeof__'.

This allows us to simplify usage of hashmap iterator macros
by not having to specify a type when a pointer of that type
is already given.

In the future, list iterator macros (e.g. list_for_each_entry)
may also be implemented using OFFSETOF_VAR to save hackers the
trouble of using container_of/list_entry macros and without
relying on non-portable `__typeof__'.

v3: use `__typeof__' to avoid clang warnings

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong 87571c3f71 hashmap: use *_entry APIs for iteration
Inspired by list_for_each_entry in the Linux kernel.
Once again, these are somewhat compromised usability-wise
by compilers lacking __typeof__ support.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong 939af16eac hashmap_cmp_fn takes hashmap_entry params
Another step in eliminating the requirement of hashmap_entry
being the first member of a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong f23a465132 hashmap_get{,_from_hash} return "struct hashmap_entry *"
Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash
or container_of as appropriate.

This is another step towards eliminating the requirement of
hashmap_entry being the first field in a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong b6c5241606 hashmap_get takes "const struct hashmap_entry *"
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong b94e5c1df6 hashmap_add takes "struct hashmap_entry *"
This is less error-prone than "void *" as the compiler now
detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong d22245a2e3 hashmap_entry_init takes "struct hashmap_entry *"
C compilers do type checking to make life easier for us.  So
rely on that and update all hashmap_entry_init callers to take
"struct hashmap_entry *" to avoid future bugs while improving
safety and readability.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:09 +09:00
Derrick Stolee 65edd96aec treewide: rename 'exclude' methods to 'pattern'
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit renames several methods defined in dir.h to make
more sense with the renamed 'struct exclude_list' to 'struct
pattern_list' and 'struct exclude' to 'struct path_pattern':

 * last_exclude_matching() -> last_matching_pattern()
 * parse_exclude() -> parse_path_pattern()

In addition, the word 'exclude' was replaced with 'pattern'
in the methods below:

 * add_exclude_list()
 * add_excludes_from_file_to_list()
 * add_excludes_from_file()
 * add_excludes_from_blob_to_list()
 * add_exclude()
 * clear_exclude_list()

A few methods with the word "exclude" remain. These will
be handled seperately. In particular, the method
"is_excluded()" is concretely about the .gitignore file
relative to a specific directory. This is the important
boundary between library and consumer: is_excluded() cares
about .gitignore, but is_excluded() calls
last_matching_pattern() to make that decision.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 14:05:12 -07:00
Derrick Stolee 4ff89ee52c treewide: rename 'EXCL_FLAG_' to 'PATTERN_FLAG_'
The first consumer of pattern-matching filenames was the
.gitignore feature. In that context, storing a list of patterns
as a 'struct exclude_list'  makes sense. However, the
sparse-checkout feature then adopted these structures and methods,
but with the opposite meaning: these patterns match the files
that should be included!

It would be clearer to rename this entire library as a "pattern
matching" library, and the callers apply exclusion/inclusion
logic accordingly based on their needs.

This commit replaces 'EXCL_FLAG_' to 'PATTERN_FLAG_' in the
names of the flags used on 'struct path_pattern'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 14:05:12 -07:00
Junio C Hamano 0f519aba51 Merge branch 'rd/attr.c-comment-typofix'
In-code comment typofix.

* rd/attr.c-comment-typofix:
  attr.c: ".gitattribute" -> ".gitattributes" (comments)
2019-03-11 16:16:24 +09:00
Robert P. J. Day bb101aaf0c attr.c: ".gitattribute" -> ".gitattributes" (comments)
Correct misspelled ".gitattribute" in comments only, so no functional
change.

Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-07 09:34:47 +09:00
Junio C Hamano 7589e63648 Merge branch 'nd/the-index-final'
The assumption to work on the single "in-core index" instance has
been reduced from the library-ish part of the codebase.

* nd/the-index-final:
  cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
  read-cache.c: remove the_* from index_has_changes()
  merge-recursive.c: remove implicit dependency on the_repository
  merge-recursive.c: remove implicit dependency on the_index
  sha1-name.c: remove implicit dependency on the_index
  read-cache.c: replace update_index_if_able with repo_&
  read-cache.c: kill read_index()
  checkout: avoid the_index when possible
  repository.c: replace hold_locked_index() with repo_hold_locked_index()
  notes-utils.c: remove the_repository references
  grep: use grep_opt->repo instead of explict repo argument
2019-02-06 22:05:23 -08:00
Junio C Hamano 02bf766cc5 Merge branch 'jk/attr-macro-fix'
Asking "git check-attr" about a macro (e.g. "binary") on a specific
path did not work correctly, even though "git check-attr -a" listed
such a macro correctly.  This has been corrected.

* jk/attr-macro-fix:
  attr: do not mark queried macros as unset
2019-02-05 14:26:16 -08:00
Nguyễn Thái Ngọc Duy f8adbec9fe cache.h: flip NO_THE_INDEX_COMPATIBILITY_MACROS switch
By default, index compat macros are off from now on, because they
could hide the_index dependency.

Only those in builtin can use it.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-24 11:55:06 -08:00
Jeff King 7b95849be4 attr: do not mark queried macros as unset
Since 60a12722ac (attr: remove maybe-real, maybe-macro from git_attr,
2017-01-27), we will always mark an attribute macro (e.g., "binary")
that is specifically queried for as "unspecified", even though listing
_all_ attributes would display it at set. E.g.:

  $ echo "* binary" >.gitattributes

  $ git check-attr -a file
  file: binary: set
  file: diff: unset
  file: merge: unset
  file: text: unset

  $ git check-attr binary file
  file: binary: unspecified

The problem stems from an incorrect conversion of the optimization from
06a604e670 (attr: avoid heavy work when we know the specified attr is
not defined, 2014-12-28). There we tried in collect_some_attrs() to
avoid even looking at the attr_stack when the user has asked for "foo"
and we know that "foo" did not ever appear in any .gitattributes file.

It used a flag "maybe_real" in each attribute struct, where "real" meant
that the attribute appeared in an actual file (we have to make this
distinction because we also create an attribute struct for any names
that are being queried). But as explained in that commit message, the
meaning of "real" was tangled with some special cases around macros.

When 60a12722ac later refactored the macro code, it dropped maybe_real
entirely. This missed the fact that "maybe_real" could be unset for two
reasons: because of a macro, or because it was never found during
parsing. This had two results:

  - the optimization in collect_some_attrs() ceased doing anything
    meaningful, since it no longer kept track of "was it found during
    parsing"

  - worse, it actually kicked in when the caller _did_ ask about a macro
    by name, causing us to mark it as unspecified

It should be possible to salvage this optimization, but let's start with
just removing the remnants. It hasn't been doing anything (except
creating bugs) since 60a12722ac, and nobody seems to have noticed the
performance regression. It's more important to fix the correctness
problem clearly first.

I've added two tests here. The second one actually shows off the bug.
The test of "check-attr -a" is not strictly necessary, but we currently
do not test attribute macros much, and the builtin "binary" not at all.
So this increases our general test coverage, as well as making sure we
didn't mess up this related case.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-22 13:48:15 -08:00
Junio C Hamano 3813a89fae Merge branch 'nd/i18n'
More _("i18n") markings.

* nd/i18n:
  fsck: mark strings for translation
  fsck: reduce word legos to help i18n
  parse-options.c: mark more strings for translation
  parse-options.c: turn some die() to BUG()
  parse-options: replace opterror() with optname()
  repack: mark more strings for translation
  remote.c: mark messages for translation
  remote.c: turn some error() or die() to BUG()
  reflog: mark strings for translation
  read-cache.c: add missing colon separators
  read-cache.c: mark more strings for translation
  read-cache.c: turn die("internal error") to BUG()
  attr.c: mark more string for translation
  archive.c: mark more strings for translation
  alias.c: mark split_cmdline_strerror() strings for translation
  git.c: mark more strings for translation
2019-01-04 13:33:31 -08:00
Nguyễn Thái Ngọc Duy ad8f8f4aed attr.c: mark more string for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-12 14:47:09 +09:00
Nguyễn Thái Ngọc Duy 2e1b141a01 attr.c: remove #ifdef NO_PTHREADS
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-11-05 13:42:11 +09:00
Torsten Bögershausen d64324cb60 Make git_check_attr() a void function
git_check_attr() returns always 0.
Remove all the error handling code of the callers, which is never executed.
Change git_check_attr() to be a void function.

Signed-off-by: Torsten Bögershausen <tboegi@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-09-12 15:15:34 -07:00
Nguyễn Thái Ngọc Duy c4500e251f attr: remove index from git_attr_set_direction()
Since attr checking API now take the index, there's no need to set an
index in advance with this call. Most call sites are straightforward
because they either pass the_index or NULL (which defaults back to
the_index previously). There's only one suspicious call site in
unpack-trees.c where it sets a different index.

This code in unpack-trees is about to check out entries from the
new/temporary index after merging is done in it. The attributes will
be used by entry.c code to do crlf conversion if needed. entry.c now
respects struct checkout's istate field, and this field is correctly
set in unpack-trees.c, there should be no regression from this change.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:43 -07:00
Nguyễn Thái Ngọc Duy 7a400a2c02 attr: remove an implicit dependency on the_index
Make the attr API take an index_state instead of assuming the_index in
attr code. All call sites are converted blindly to keep the patch
simple and retain current behavior. Individual call sites may receive
further updates to use the right index instead of the_index.

There is one ugly temporary workaround added in attr.c that needs some
more explanation.

Commit c24f3abace (apply: file commited with CRLF should roundtrip
diff and apply - 2017-08-19) forces one convert_to_git() call to NOT
read the index at all. But what do you know, we read it anyway by
falling back to the_index. When "istate" from convert_to_git is now
propagated down to read_attr_from_array() we will hit segfault
somewhere inside read_blob_data_from_index.

The right way of dealing with this is to kill "use_index" variable and
only follow "istate" but at this stage we are not ready for that:
while most git_attr_set_direction() calls just passes the_index to be
assigned to use_index, unpack-trees passes a different one which is
used by entry.c code, which has no way to know what index to use if we
delete use_index. So this has to be done later.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-13 14:14:42 -07:00
Johannes Schindelin 033abf97fc Replace all die("BUG: ...") calls by BUG() ones
In d8193743e0 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae5
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).

The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.

Let's just convert all remaining ones in one fell swoop.

This trick was performed by this invocation:

	sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-06 19:06:13 +09:00
Stefan Beller d807c4a01d exec_cmd: rename to use dash in file name
This is more consistent with the project style. The majority of Git's
source files use dashes in preference to underscores in their file names.

Signed-off-by: Stefan Beller <sbeller@google.com>
2018-04-11 18:11:00 +09:00
Jeff Hostetler 8b604d1951 hashmap: add API to disable item counting when threaded
This is to address concerns raised by ThreadSanitizer on the mailing list
about threaded unprotected R/W access to map.size with my previous "disallow
rehash" change (0607e10009).

See:
https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.agren@gmail.com/

Add API to hashmap to disable item counting and thus automatic rehashing.
Also include API to later re-enable them.

When item counting is disabled, the map.size field is invalid.  So to
prevent accidents, the field has been renamed and an accessor function
hashmap_get_size() has been added.  All direct references to this
field have been been updated.  And the name of the field changed
to map.private_size to communicate this.

Here is the relevant output from ThreadSanitizer showing the problem:

WARNING: ThreadSanitizer: data race (pid=10554)
  Read of size 4 at 0x00000082d488 by thread T2 (mutexes: write M16):
    #0 hashmap_add hashmap.c:209
    #1 hash_dir_entry_with_parent_and_prefix name-hash.c:302
    #2 handle_range_dir name-hash.c:347
    #3 handle_range_1 name-hash.c:415
    #4 lazy_dir_thread_proc name-hash.c:471
    #5 <null> <null>

  Previous write of size 4 at 0x00000082d488 by thread T1 (mutexes: write M31):
    #0 hashmap_add hashmap.c:209
    #1 hash_dir_entry_with_parent_and_prefix name-hash.c:302
    #2 handle_range_dir name-hash.c:347
    #3 handle_range_1 name-hash.c:415
    #4 handle_range_dir name-hash.c:380
    #5 handle_range_1 name-hash.c:415
    #6 lazy_dir_thread_proc name-hash.c:471
    #7 <null> <null>

Martin gives instructions for running TSan on test t3008 in this post:
https://public-inbox.org/git/CAN0heSoJDL9pWELD6ciLTmWf-a=oyxe4EXXOmCKvsG5MSuzxsA@mail.gmail.com/

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07 09:42:02 +09:00
Stefan Beller 201c14e375 attr.c: drop hashmap_cmp_fn cast
MAke the code more readable and less error prone by avoiding the cast
of the compare function pointer in hashmap_init, but instead have the
correctly named void pointers to casted to the specific data structure.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-05 13:53:12 -07:00
Stefan Beller 7663cdc86c hashmap.h: compare function has access to a data field
When using the hashmap a common need is to have access to caller provided
data in the compare function. A couple of times we abuse the keydata field
to pass in the data needed. This happens for example in patch-ids.c.

This patch changes the function signature of the compare function
to have one more void pointer available. The pointer given for each
invocation of the compare function must be defined in the init function
of the hashmap and is just passed through.

Documentation of this new feature is deferred to a later patch.
This is a rather mechanical conversion, just adding the new pass-through
parameter.  However while at it improve the naming of the fields of all
compare functions used by hashmaps by ensuring unused parameters are
prefixed with 'unused_' and naming the parameters what they are (instead
of 'unused' make it 'unused_keydata').

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 12:49:28 -07:00
Junio C Hamano 50f03c6676 Merge branch 'ab/free-and-null'
A common pattern to free a piece of memory and assign NULL to the
pointer that used to point at it has been replaced with a new
FREE_AND_NULL() macro.

* ab/free-and-null:
  *.[ch] refactoring: make use of the FREE_AND_NULL() macro
  coccinelle: make use of the "expression" FREE_AND_NULL() rule
  coccinelle: add a rule to make "expression" code use FREE_AND_NULL()
  coccinelle: make use of the "type" FREE_AND_NULL() rule
  coccinelle: add a rule to make "type" code use FREE_AND_NULL()
  git-compat-util: add a FREE_AND_NULL() wrapper around free(ptr); ptr = NULL
2017-06-24 14:28:41 -07:00
Junio C Hamano f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00
Ævar Arnfjörð Bjarmason 6a83d90207 coccinelle: make use of the "type" FREE_AND_NULL() rule
Apply the result of the just-added coccinelle rule. This manually
excludes a few occurrences, mostly things that resulted in many
FREE_AND_NULL() on one line, that'll be manually fixed in a subsequent
change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-16 12:44:03 -07:00
Brandon Williams b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Nguyễn Thái Ngọc Duy e9d983f116 wrapper.c: add and use fopen_or_warn()
When fopen() returns NULL, it could be because the given path does not
exist, but it could also be some other errors and the caller has to
check. Add a wrapper so we don't have to repeat the same error check
everywhere.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-26 12:33:56 +09:00
Brandon Williams b0db704652 pathspec: allow querying for attributes
The pathspec mechanism is extended via the new
":(attr:eol=input)pattern/to/match" syntax to filter paths so that it
requires paths to not just match the given pattern but also have the
specified attrs attached for them to be chosen.

Based on a patch by Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-13 15:28:54 -07:00
Brandon Williams f0dd042148 attr: reformat git_attr_set_direction() function
Move the 'git_attr_set_direction()' up to be closer to the variables
that it modifies as well as a small formatting by renaming the variable
'new' to 'new_direction' so that it is more descriptive.

Update the comment about how 'direction' is used to read the state of
the world.  It should be noted that callers of
'git_attr_set_direction()' should ensure that other threads are not
making calls into the attribute system until after the call to
'git_attr_set_direction()' completes.  This function essentially acts as
reset button for the attribute system and should be handled with care.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:53 -08:00
Brandon Williams 0787dafdcc attr: push the bare repo check into read_attr()
Push the bare repository check into the 'read_attr()' function.  This
avoids needing to have extra logic which creates an empty stack frame
when inside a bare repo as a similar bit of logic already exists in the
'read_attr()' function.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:53 -08:00
Brandon Williams dc81cf377c attr: store attribute stack in attr_check structure
The last big hurdle towards a thread-safe API for the attribute system
is the reliance on a global attribute stack that is modified during each
call into the attribute system.

This patch removes this global stack and instead a stack is stored
locally in each attr_check instance.  This opens up the opportunity for
future optimizations to customize the attribute stack for the attributes
that a particular attr_check struct is interested in.

One caveat with pushing the attribute stack into the attr_check
structure is that the attribute system now needs to keep track of all
active attr_check instances.  Due to the direction mechanism the stack
needs to be dropped when the direction is switched.  In order to ensure
correctness when the direction is changed the attribute system needs to
iterate through all active attr_check instances and drop each of their
stacks.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:53 -08:00
Brandon Williams e810e06357 attr: tighten const correctness with git_attr and match_attr
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:53 -08:00
Brandon Williams 60a12722ac attr: remove maybe-real, maybe-macro from git_attr
Whether or not a git attribute is real or a macro isn't a property of
the attribute but rather it depends on the attribute stack (which
.gitattribute files were read).

This patch removes the 'maybe_real' and 'maybe_macro' fields in a
git_attr and instead adds the 'macro' field to a attr_check_item.  The
'macro' indicates (if non-NULL) that a particular attribute is a macro
for the given attribute stack.  It's populated, through a quick scan of
the attribute stack, with the match_attr that corresponds to the macro's
definition.  This way the attribute stack only needs to be scanned a
single time prior to attribute collection instead of each time a macro
needs to be expanded.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:53 -08:00
Brandon Williams 685b292575 attr: eliminate global check_all_attr array
Currently there is a reliance on 'check_all_attr' which is a global
array of 'attr_check_item' items which is used to store the value of
each attribute during the collection process.

This patch eliminates this global and instead creates an array per
'attr_check' instance which is then used in the attribute collection
process.  This brings the attribute system one step closer to being
thread-safe.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:53 -08:00
Brandon Williams 1a600b7555 attr: use hashmap for attribute dictionary
The current implementation of the attribute dictionary uses a custom
hashtable.  This modernizes the dictionary by converting it to the builtin
'hashmap' structure.

Also, in order to enable a threaded API in the future add an
accompanying mutex which must be acquired prior to accessing the
dictionary of interned attributes.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:53 -08:00
Junio C Hamano 428103c7f1 attr: change validity check for attribute names to use positive logic
Convert 'invalid_attr_name()' to 'attr_name_valid()' and use positive
logic for the return value.  In addition create a helper function that
prints out an error message when an invalid attribute name is used.

We could later update the message to exactly spell out what the
rules for a good attribute name are, etc.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:52 -08:00
Brandon Williams 6bc2e3f709 attr: pass struct attr_check to collect_some_attrs
The old callchain used to take an array of attr_check_item items.
Instead pass the 'attr_check' container object to 'collect_some_attrs()'
and access the fields in the data structure directly.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:52 -08:00
Junio C Hamano 1295c21524 attr: retire git_check_attrs() API
Since nobody uses the old API, make it file-scope static, and update
the documentation to describe the new API.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-01 13:46:52 -08:00