1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-04 17:06:12 +02:00
Commit Graph

45 Commits

Author SHA1 Message Date
Elijah Newren b9a7ac2c68 hash-ll, hashmap: move oidhash() to hash-ll
oidhash() was used by both hashmap and khash, which makes sense.
However, the location of this function in hashmap.[ch] meant that
khash.h had to depend upon hashmap.h, making people unfamiliar with
khash think that it was built upon hashmap.  (Or at least, I personally
was confused for a while about this in the past.)

Move this function to hash-ll, so that khash.h can stop depending upon
hashmap.h.

This has another benefit as well: it allows us to remove hashmap.h's
dependency on hash-ll.h.  While some callers of hashmap.h were making
use of oidhash, most were not, so this change provides another way to
reduce the number of includes.

Diff best viewed with `--color-moved`.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-06-21 13:39:54 -07:00
Junio C Hamano ccd12a3d6c Merge branch 'en/header-split-cache-h-part-2'
More header clean-up.

* en/header-split-cache-h-part-2: (22 commits)
  reftable: ensure git-compat-util.h is the first (indirect) include
  diff.h: reduce unnecessary includes
  object-store.h: reduce unnecessary includes
  commit.h: reduce unnecessary includes
  fsmonitor: reduce includes of cache.h
  cache.h: remove unnecessary headers
  treewide: remove cache.h inclusion due to previous changes
  cache,tree: move basic name compare functions from read-cache to tree
  cache,tree: move cmp_cache_name_compare from tree.[ch] to read-cache.c
  hash-ll.h: split out of hash.h to remove dependency on repository.h
  tree-diff.c: move S_DIFFTREE_IFXMIN_NEQ define from cache.h
  dir.h: move DTYPE defines from cache.h
  versioncmp.h: move declarations for versioncmp.c functions from cache.h
  ws.h: move declarations for ws.c functions from cache.h
  match-trees.h: move declarations for match-trees.c functions from cache.h
  pkt-line.h: move declarations for pkt-line.c functions from cache.h
  base85.h: move declarations for base85.c functions from cache.h
  copy.h: move declarations for copy.c functions from cache.h
  server-info.h: move declarations for server-info.c functions from cache.h
  packfile.h: move pack_window and pack_entry from cache.h
  ...
2023-05-09 16:45:46 -07:00
Elijah Newren d1cbe1e6d8 hash-ll.h: split out of hash.h to remove dependency on repository.h
hash.h depends upon and includes repository.h, due to the definition and
use of the_hash_algo (defined as the_repository->hash_algo).  However,
most headers trying to include hash.h are only interested in the layout
of the structs like object_id.  Move the parts of hash.h that do not
depend upon repository.h into a new file hash-ll.h (the "low level"
parts of hash.h), and adjust other files to use this new header where
the convenience inline functions aren't needed.

This allows hash.h and object.h to be fairly small, minimal headers.  It
also exposes a lot of hidden dependencies on both path.h (which was
brought in by repository.h) and repository.h (which was previously
implicitly brought in by object.h), so also adjust other files to be
more explicit about what they depend upon.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-24 12:47:32 -07:00
Siddharth Singh ec063d2591 hashmap.h: fix minor typo
The word "no" should be "not".

Signed-off-by: Siddharth Singh <siddhartth@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-03-30 10:18:39 -07:00
Elijah Newren 6da1a25814 hashmap: provide deallocation function names
hashmap_free(), hashmap_free_entries(), and hashmap_free_() have existed
for a while, but aren't necessarily the clearest names, especially with
hashmap_partial_clear() being added to the mix and lazy-initialization
now being supported.  Peff suggested we adopt the following names[1]:

  - hashmap_clear() - remove all entries and de-allocate any
    hashmap-specific data, but be ready for reuse

  - hashmap_clear_and_free() - ditto, but free the entries themselves

  - hashmap_partial_clear() - remove all entries but don't deallocate
    table

  - hashmap_partial_clear_and_free() - ditto, but free the entries

This patch provides the new names and converts all existing callers over
to the new naming scheme.

[1] https://lore.kernel.org/git/20201030125059.GA3277724@coredump.intra.peff.net/

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 12:15:50 -08:00
Elijah Newren 33f20d8217 hashmap: introduce a new hashmap_partial_clear()
merge-ort is a heavy user of strmaps, which are built on hashmap.[ch].
clear_or_reinit_internal_opts() in merge-ort was taking about 12% of
overall runtime in my testcase involving rebasing 35 patches of
linux.git across a big rename.  clear_or_reinit_internal_opts() was
calling hashmap_free() followed by hashmap_init(), meaning that not only
was it freeing all the memory associated with each of the strmaps just
to immediately allocate a new array again, it was allocating a new array
that was likely smaller than needed (thus resulting in later need to
rehash things).  The ending size of the map table on the previous commit
was likely almost perfectly sized for the next commit we wanted to pick,
and not dropping and reallocating the table immediately is a win.

Add some new API to hashmap to clear a hashmap of entries without
freeing map->table (and instead only zeroing it out like alloc_table()
would do, along with zeroing the count of items in the table and the
shrink_at field).

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 12:15:50 -08:00
Elijah Newren b7879b0ba6 hashmap: allow re-use after hashmap_free()
Previously, once map->table had been freed, any calls to hashmap_put(),
hashmap_get(), or hashmap_remove() would cause a NULL pointer
dereference (since hashmap_free_() also zeros the memory; without that
zeroing, calling these functions would cause a use-after-free problem).

Modify these functions to check for a NULL table and automatically
allocate as needed.

Also add a HASHMAP_INIT(fn, data) macro for initializing hashmaps on the
stack without calling hashmap_init().

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 12:15:50 -08:00
Elijah Newren 97a39a4a93 hashmap: adjust spacing to fix argument alignment
No actual code changes; just whitespace adjustments.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-11-02 12:15:50 -08:00
Elijah Newren 6474b86939 hashmap: add usage documentation explaining hashmap_free[_entries]()
The existence of hashmap_free() and hashmap_free_entries() confused me,
and the docs weren't clear enough.  We are dealing with a map table,
entries in that table, and possibly also things each of those entries
point to.  I had to consult other source code examples and the
implementation.  Add a brief note to clarify the differences.  This will
become even more important once we introduce a new
hashmap_partial_clear() function which will add the question of whether
the table itself has been freed.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-10-13 13:06:37 -07:00
Junio C Hamano 0ad621f61e hashmap_for_each_entry(): workaround MSVC's runtime check failure #3
The OFFSETOF_VAR(var, member) macro is implemented in terms of
offsetof(typeof(*var), member) with compilers that know typeof(),
but its fallback implemenation compares &(var->member) and (var) and
count the distance in bytes, i.e.

    ((uintptr_t)&(var)->member - (uintptr_t)(var))

MSVC's runtime check, when fed an uninitialized 'var', flags this as
a use of an uninitialized variable (and that is legit---uninitialized
contents of 'var' is subtracted) in a debug build.

After auditing all 6 uses of OFFSETOF_VAR(), 1 of them does feed a
potentially uninitialized 'var' to the macro in the beginning of the
for() loop:

    #define hashmap_for_each_entry(map, iter, var, member) \
            for (var = hashmap_iter_first_entry_offset(map, iter, \
                                                    OFFSETOF_VAR(var, member)); \
                    var; \
                    var = hashmap_iter_next_entry_offset(iter, \
                                                    OFFSETOF_VAR(var, member)))

We can work around this by making sure that var has _some_ value
when OFFSETOF_VAR() is called.  Strictly speaking, it invites
undefined behaviour to use NULL here if we end up with pointer
comparison, but MSVC runtime seems to be happy with it, and most
other systems have typeof() and don't even need pointer comparison
fallback code.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-09-30 13:26:54 -07:00
Elijah Newren 861c4ce141 hashmap: fix typo in usage docs
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2020-07-28 14:28:15 -07:00
Junio C Hamano d3096d2ba6 Merge branch 'en/doc-typofix'
Docfix.

* en/doc-typofix:
  Fix spelling errors in no-longer-updated-from-upstream modules
  multimail: fix a few simple spelling errors
  sha1dc: fix trivial comment spelling error
  Fix spelling errors in test commands
  Fix spelling errors in messages shown to users
  Fix spelling errors in names of tests
  Fix spelling errors in comments of testcases
  Fix spelling errors in code comments
  Fix spelling errors in documentation outside of Documentation/
  Documentation: fix a bunch of typos, both old and new
2019-12-01 09:04:35 -08:00
Elijah Newren 15beaaa3d1 Fix spelling errors in code comments
Reported-by: Jens Schleusener <Jens.Schleusener@fossies.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-10 16:00:54 +09:00
Elijah Newren c92faa4d22 hashmap: fix documentation misuses of -> versus .
Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-11-07 14:40:56 +09:00
Eric Wong e2b5038d87 hashmap_entry: remove first member requirement from docs
Comments stating that "struct hashmap_entry" must be the first
member in a struct are no longer valid.

Suggested-by: Phillip Wood <phillip.wood123@gmail.com>
Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:12 +09:00
Eric Wong 404ab78e39 hashmap: remove type arg from hashmap_{get,put,remove}_entry
Since these macros already take a `keyvar' pointer of a known type,
we can rely on OFFSETOF_VAR to get the correct offset without
relying on non-portable `__typeof__' and `offsetof'.

Argument order is also rearranged, so `keyvar' and `member' are
sequential as they are used as: `keyvar->member'

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:12 +09:00
Eric Wong 23dee69f53 OFFSETOF_VAR macro to simplify hashmap iterators
While we cannot rely on a `__typeof__' operator being portable
to use with `offsetof'; we can calculate the pointer offset
using an existing pointer and the address of a member using
pointer arithmetic for compilers without `__typeof__'.

This allows us to simplify usage of hashmap iterator macros
by not having to specify a type when a pointer of that type
is already given.

In the future, list iterator macros (e.g. list_for_each_entry)
may also be implemented using OFFSETOF_VAR to save hackers the
trouble of using container_of/list_entry macros and without
relying on non-portable `__typeof__'.

v3: use `__typeof__' to avoid clang warnings

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong c8e424c9c9 hashmap: introduce hashmap_free_entries
`hashmap_free_entries' behaves like `container_of' and passes
the offset of the hashmap_entry struct to the internal
`hashmap_free_' function, allowing the function to free any
struct pointer regardless of where the hashmap_entry field
is located.

`hashmap_free' no longer takes any arguments aside from
the hashmap itself.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong 8a973d0bb3 hashmap: hashmap_{put,remove} return hashmap_entry *
And add *_entry variants to perform container_of as necessary
to simplify most callers.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong 87571c3f71 hashmap: use *_entry APIs for iteration
Inspired by list_for_each_entry in the Linux kernel.
Once again, these are somewhat compromised usability-wise
by compilers lacking __typeof__ support.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong 939af16eac hashmap_cmp_fn takes hashmap_entry params
Another step in eliminating the requirement of hashmap_entry
being the first member of a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong f23a465132 hashmap_get{,_from_hash} return "struct hashmap_entry *"
Update callers to use hashmap_get_entry, hashmap_get_entry_from_hash
or container_of as appropriate.

This is another step towards eliminating the requirement of
hashmap_entry being the first field in a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:11 +09:00
Eric Wong f0e63c4113 hashmap: use *_entry APIs to wrap container_of
Using `container_of' can be verbose and choosing names for
intermediate "struct hashmap_entry" pointers is a hard problem.
So introduce "*_entry" APIs inspired by similar linked-list
APIs in the Linux kernel.

Unfortunately, `__typeof__' is not portable C, so we need an
extra parameter to specify the type.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong 6bcbdfb277 hashmap_get_next returns "struct hashmap_entry *"
This is a step towards removing the requirement for
hashmap_entry being the first field of a struct.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong 26b455f21e hashmap_put takes "struct hashmap_entry *"
This is less error-prone than "void *" as the compiler now
detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong 28ee794128 hashmap_remove takes "const struct hashmap_entry *"
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong b6c5241606 hashmap_get takes "const struct hashmap_entry *"
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong b94e5c1df6 hashmap_add takes "struct hashmap_entry *"
This is less error-prone than "void *" as the compiler now
detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong f6eb6bdcf2 hashmap_get_next takes "const struct hashmap_entry *"
This is less error-prone than "const void *" as the compiler
now detects invalid types being passed.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:10 +09:00
Eric Wong d22245a2e3 hashmap_entry_init takes "struct hashmap_entry *"
C compilers do type checking to make life easier for us.  So
rely on that and update all hashmap_entry_init callers to take
"struct hashmap_entry *" to avoid future bugs while improving
safety and readability.

Signed-off-by: Eric Wong <e@80x24.org>
Reviewed-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-10-07 10:20:09 +09:00
Jeff King d40abc8e95 hashmap: convert sha1hash() to oidhash()
There are no callers left of sha1hash() that do not simply pass the
"hash" member of a "struct object_id". Let's get rid of the outdated
sha1-specific function and provide one that operates on the whole struct
(even though the technique, taking the first few bytes of the hash, will
remain the same).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-06-20 10:44:22 -07:00
Denton Liu 554544276a *.[ch]: remove extern from function declarations using spatch
There has been a push to remove extern from function declarations.
Remove some instances of "extern" for function declarations which are
caught by Coccinelle. Note that Coccinelle has some difficulty with
processing functions with `__attribute__` or varargs so some `extern`
declarations are left behind to be dealt with in a future patch.

This was the Coccinelle patch used:

	@@
	type T;
	identifier f;
	@@
	- extern
	  T f(...);

and it was run with:

	$ git ls-files \*.{c,h} |
		grep -v ^compat/ |
		xargs spatch --sp-file contrib/coccinelle/noextern.cocci --in-place

Files under `compat/` are intentionally excluded as some are directly
copied from external sources and we should avoid churning them as much
as possible.

Signed-off-by: Denton Liu <liu.denton@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-05-05 15:20:06 +09:00
Junio C Hamano 922ffec6fe Merge branch 'rb/hashmap-h-compilation-fix'
Code clean-up.

* rb/hashmap-h-compilation-fix:
  hashmap.h: remove unused variable
2018-02-13 13:39:10 -08:00
Randall S. Becker 7d68bb0766 hashmap.h: remove unused variable
In 'hashmap_enable_item_counting()', item is assigned but never
used.  This causes a warning on HP NonStop.  As the variable is
never used, fix this by just removing it.

Signed-off-by: Randall S. Becker <rsbecker@nexbridge.com>
Helped-by: Thomas Gummerer <t.gummerer@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-01-16 14:27:07 -08:00
Johannes Schindelin 826c778f7c hashmap: adjust documentation to reflect reality
The hashmap API is just complicated enough that even at least one
long-time Git contributor has to look up how to use it every time he
finds a new use case. When that happens, it is really useful if the
provided example code is correct...

While at it, "fix a memory leak", avoid statements before variable
declarations, fix a const -> no-const cast, several %l specifiers (which
want to be %ld), avoid using an undefined constant, call scanf()
correctly, use FLEX_ALLOC_STR() where appropriate, and adjust the style
here and there.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-05 13:37:43 -08:00
Jeff Hostetler 8b604d1951 hashmap: add API to disable item counting when threaded
This is to address concerns raised by ThreadSanitizer on the mailing list
about threaded unprotected R/W access to map.size with my previous "disallow
rehash" change (0607e10009).

See:
https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.agren@gmail.com/

Add API to hashmap to disable item counting and thus automatic rehashing.
Also include API to later re-enable them.

When item counting is disabled, the map.size field is invalid.  So to
prevent accidents, the field has been renamed and an accessor function
hashmap_get_size() has been added.  All direct references to this
field have been been updated.  And the name of the field changed
to map.private_size to communicate this.

Here is the relevant output from ThreadSanitizer showing the problem:

WARNING: ThreadSanitizer: data race (pid=10554)
  Read of size 4 at 0x00000082d488 by thread T2 (mutexes: write M16):
    #0 hashmap_add hashmap.c:209
    #1 hash_dir_entry_with_parent_and_prefix name-hash.c:302
    #2 handle_range_dir name-hash.c:347
    #3 handle_range_1 name-hash.c:415
    #4 lazy_dir_thread_proc name-hash.c:471
    #5 <null> <null>

  Previous write of size 4 at 0x00000082d488 by thread T1 (mutexes: write M31):
    #0 hashmap_add hashmap.c:209
    #1 hash_dir_entry_with_parent_and_prefix name-hash.c:302
    #2 handle_range_dir name-hash.c:347
    #3 handle_range_1 name-hash.c:415
    #4 handle_range_dir name-hash.c:380
    #5 handle_range_1 name-hash.c:415
    #6 lazy_dir_thread_proc name-hash.c:471
    #7 <null> <null>

Martin gives instructions for running TSan on test t3008 in this post:
https://public-inbox.org/git/CAN0heSoJDL9pWELD6ciLTmWf-a=oyxe4EXXOmCKvsG5MSuzxsA@mail.gmail.com/

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-07 09:42:02 +09:00
Stefan Beller 1ecbf31d02 hashmap: migrate documentation from Documentation/technical into header
While at it, clarify the use of `key`, `keydata`, `entry_or_key` as well
as documenting the new data pointer for the compare function.

Rework the example.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 13:11:59 -07:00
Stefan Beller 7663cdc86c hashmap.h: compare function has access to a data field
When using the hashmap a common need is to have access to caller provided
data in the compare function. A couple of times we abuse the keydata field
to pass in the data needed. This happens for example in patch-ids.c.

This patch changes the function signature of the compare function
to have one more void pointer available. The pointer given for each
invocation of the compare function must be defined in the init function
of the hashmap and is just passed through.

Documentation of this new feature is deferred to a later patch.
This is a rather mechanical conversion, just adding the new pass-through
parameter.  However while at it improve the naming of the fields of all
compare functions used by hashmaps by ensuring unused parameters are
prefixed with 'unused_' and naming the parameters what they are (instead
of 'unused' make it 'unused_keydata').

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-30 12:49:28 -07:00
Jeff Hostetler 0607e10009 hashmap: add disallow_rehash setting
Teach hashmap to allow rehashes to be suppressed.
This is useful when hashmaps are accessed by multiple
threads.  It still requires the caller to properly
manage their locking.  This just prevents unexpected
rehashing during inserts and deletes.

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-22 13:41:41 -07:00
Jeff Hostetler f75619bd6d hashmap: allow memihash computation to be continued
Add variant of memihash() to allow the hash computation to
be continued.  There are times when we compute the hash on
a full path and then the hash on just the path to the parent
directory.  This can be expensive on large repositories.

With this, we can hash the parent directory first. And then
continue the computation to include the "/filename".

Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-22 13:41:41 -07:00
Karsten Blees 7b64d42d22 hashmap: add string interning API
Interning short strings with high probability of duplicates can reduce the
memory footprint and speed up comparisons.

Add strintern() and memintern() APIs that use a hashmap to manage the pool
of unique, interned strings.

Note: strintern(getenv()) could be used to sanitize git's use of getenv(),
in case we ever encounter a platform where a call to getenv() invalidates
previous getenv() results (which is allowed by POSIX).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 13:56:38 -07:00
Karsten Blees ab73a9d119 hashmap: add simplified hashmap_get_from_hash() API
Hashmap entries are typically looked up by just a key. The hashmap_get()
API expects an initialized entry structure instead, to support compound
keys. This flexibility is currently only needed by find_dir_entry() in
name-hash.c (and compat/win32/fscache.c in the msysgit fork). All other
(currently five) call sites of hashmap_get() have to set up a near emtpy
entry structure, resulting in duplicate code like this:

  struct hashmap_entry keyentry;
  hashmap_entry_init(&keyentry, hash(key));
  return hashmap_get(map, &keyentry, key);

Add a hashmap_get_from_hash() API that allows hashmap lookups by just
specifying the key and its hash code, i.e.:

  return hashmap_get_from_hash(map, hash(key), key);

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 13:56:35 -07:00
Karsten Blees 039dc71a7c hashmap: factor out getting a hash code from a SHA1
Copying the first bytes of a SHA1 is duplicated in six places,
however, the implications (the actual value would depend on the
endianness of the platform) is documented only once.

Add a properly documented API for this.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 13:56:24 -07:00
Karsten Blees b6aad99473 hashmap.h: use 'unsigned int' for hash-codes everywhere
Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-24 15:26:30 -08:00
Karsten Blees 6a364ced49 add a hashtable implementation that supports O(1) removal
The existing hashtable implementation (in hash.[ch]) uses open addressing
(i.e. resolve hash collisions by distributing entries across the table).
Thus, removal is difficult to implement with less than O(n) complexity.
Resolving collisions of entries with identical hashes (e.g. via chaining)
is left to the client code.

Add a hashtable implementation that supports O(1) removal and is slightly
easier to use due to builtin entry chaining.

Supports all basic operations init, free, get, add, remove and iteration.

Also includes ready-to-use hash functions based on the public domain FNV-1
algorithm (http://www.isthe.com/chongo/tech/comp/fnv).

The per-entry data structure (hashmap_entry) is piggybacked in front of
the client's data structure to save memory. See test-hashmap.c for usage
examples.

The hashtable is resized by a factor of four when 80% full. With these
settings, average memory consumption is about 2/3 of hash.[ch], and
insertion is about twice as fast due to less frequent resizing.

Lookups are also slightly faster, because entries are strictly confined to
their bucket (i.e. no data of other buckets needs to be traversed).

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-11-18 13:03:51 -08:00