mirror of
https://github.com/git/git.git
synced 2024-06-03 01:46:11 +02:00
4bfdf5800f
khashl is an updated version of khash with less memory overhead (one bit/bucket instead of two) than the original khash and similar overall performance. According to its author, insertions are simpler (linear probing) but deletions may be slightly slower[1]. Of course, the majority of hash tables in git do not delete individual elements. Overall memory usage did not decrease much, as the hash tables and elements we store in them are big and currently dwarf the overhead of the khash internals. Only around 10 MB in allocations (and a few dozen KB peak use out of ~6 GB) is saved when doing a no-op `git gc' of a Linux kernel object store with thousands of refs and islands. A summary of differences I've found from khash to khashl: * two 32-bit ints (instead of four) in the top-level struct * 2 heap allocations (instead of 3) for maps (though I wonder locality suffers when probing is necessary) * 1 bit of metadata per-bucket (no tombstones for deleted elements) * 0.75 load factor. Lowered slightly from 0.77, but no FP multiply and responsible for the aforementioned struct size reduction * FNV-1A instead of x31 hash for strings * Fibonacci hashing (__kh_h2b), probably good for FNV-1A, but I'm skeptical of its usefulness for our SHA-* using cases * linear probing instead of quadratic * Wang's integer hash functions (currently unused) * optional hash value caching and ensemble APIs (currently unused) * some API differences (see below), but not enough to easily use both khash and khashl in the same compilation unit This patch was made with two additional goals to ease review: 1) minimize changes outside of khash*.h files 2) minimize and document all differences from upstream[2] khashl.h Our khashl.h differences from upstream: * favor portability constructs from our codebase: MAYBE_UNUSED over klib_unused, inline over kh_inline, and various integer types * disable packed attribute to satisfy -Werror=address-of-packed-member, AFAIK it doesn't change any of the data structures we use * port the following commits over from our old khash.h:9249ca26ac
(khash: factor out kh_release_*, 2018-10-04)2756ca4347
(use REALLOC_ARRAY for changing the allocation size of arrays, 2014-09-16)5632e838f8
(khash: clarify that allocations never fail, 2021-07-03) * use our memory allocation wrappers * provide wrappers for compatibility with existing callers using the khash API. The khashl function naming convention is: ${NOUN}_${VERB} while the khash convention is: kh_${VERB}_${NOUN}. The kh_${NAME}_t typedef and naming convention are preserved via __KHASH_COMPAT macro to ease review (despite the `_t' suffix being reserved and typedefs being discouraged in the Linux kernel). * copy relevant API docs over from khash.h for identically named macros * preserve kh_begin, kh_foreach, kh_foreach_value from khash.h since khashl.h doesn't provide them * flesh out KHASHL_{SET,MAP}_INIT wrappers with *_clear, *_resize, and *_release functions * sparse fixes from Junio and Jeff [1] https://attractivechaos.wordpress.com/2019/12/28/deletion-from-hash-tables-without-tombstones/ [2] git clone https://github.com/attractivechaos/klib.git 2895a16cb55e (support an ensemble of hash tables, 2023-12-18) khashl.h API differences from khash.h which affected this change: * KHASHL_MAP_INIT and KHASHL_SET_INIT macros replace KHASH_INIT * user-supplied hash and equality functions use different names * object-store-ll.h avoided the kh_*_t convention (since I dislike typedef) and was the only place where I had to change a definition. Signed-off-by: Eric Wong <e@80x24.org> Helped-by: Junio C Hamano <gitster@pobox.com> Helped-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
13 lines
282 B
C
13 lines
282 B
C
#ifndef OBJECT_STORE_H
|
|
#define OBJECT_STORE_H
|
|
|
|
#include "khashl.h"
|
|
#include "dir.h"
|
|
#include "object-store-ll.h"
|
|
|
|
KHASHL_MAP_INIT(KH_LOCAL, odb_path_map, odb_path_map,
|
|
const char * /* key: odb_path */, struct object_directory *,
|
|
fspathhash, fspatheq)
|
|
|
|
#endif /* OBJECT_STORE_H */
|