1
0
mirror of https://github.com/git/git.git synced 2024-09-29 03:51:25 +02:00
Commit Graph

104 Commits

Author SHA1 Message Date
Junio C Hamano
49f210fd52 Merge branch 'ds/multi-pack-index'
When there are too many packfiles in a repository (which is not
recommended), looking up an object in these would require
consulting many pack .idx files; a new mechanism to have a single
file that consolidates all of these .idx files is introduced.

* ds/multi-pack-index: (32 commits)
  pack-objects: consider packs in multi-pack-index
  midx: test a few commands that use get_all_packs
  treewide: use get_all_packs
  packfile: add all_packs list
  midx: fix bug that skips midx with alternates
  midx: stop reporting garbage
  midx: mark bad packed objects
  multi-pack-index: store local property
  multi-pack-index: provide more helpful usage info
  midx: clear midx on repack
  packfile: skip loading index if in multi-pack-index
  midx: prevent duplicate packfile loads
  midx: use midx in approximate_object_count
  midx: use existing midx when writing new one
  midx: use midx in abbreviation calculations
  midx: read objects from multi-pack-index
  config: create core.multiPackIndex setting
  midx: write object offsets
  midx: write object id fanout chunk
  midx: write object ids in a chunk
  ...
2018-09-17 13:53:50 -07:00
Junio C Hamano
29d9e3e2c4 Merge branch 'nd/pack-deltify-regression-fix'
In a recent update in 2.18 era, "git pack-objects" started
producing a larger than necessary packfiles by missing
opportunities to use large deltas.

* nd/pack-deltify-regression-fix:
  pack-objects: fix performance issues on packing large deltas
2018-08-22 11:17:05 -07:00
Derrick Stolee
454ea2e4d7 treewide: use get_all_packs
There are many places in the codebase that want to iterate over
all packfiles known to Git. The purposes are wide-ranging, and
those that can take advantage of the multi-pack-index already
do. So, use get_all_packs() instead of get_packed_git() to be
sure we are iterating over all packfiles.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-08-20 15:31:40 -07:00
Nguyễn Thái Ngọc Duy
9ac3f0e5b3 pack-objects: fix performance issues on packing large deltas
Let's start with some background about oe_delta_size() and
oe_set_delta_size(). If you already know, skip the next paragraph.

These two are added in 0aca34e826 (pack-objects: shrink delta_size
field in struct object_entry - 2018-04-14) to help reduce 'struct
object_entry' size. The delta size field in this struct is reduced to
only contain max 1MB. So if any new delta is produced and larger than
1MB, it's dropped because we can't really save such a large size
anywhere. Fallback is provided in case existing packfiles already have
large deltas, then we can retrieve it from the pack.

While this should help small machines repacking large repos without
large deltas (i.e. less memory pressure), dropping large deltas during
the delta selection process could end up with worse pack files. And if
existing packfiles already have >1MB delta and pack-objects is
instructed to not reuse deltas, all of them will be dropped on the
floor, and the resulting pack would be definitely bigger.

There is also a regression in terms of CPU/IO if we have large on-disk
deltas because fallback code needs to parse the pack every time the
delta size is needed and just access to the mmap'd pack data is enough
for extra page faults when memory is under pressure.

Both of these issues were reported on the mailing list. Here's some
numbers for comparison.

    Version  Pack (MB)  MaxRSS(kB)  Time (s)
    -------  ---------  ----------  --------
     2.17.0     5498     43513628    2494.85
     2.18.0    10531     40449596    4168.94

This patch provides a better fallback that is

- cheaper in terms of cpu and io because we won't have to read
  existing pack files as much

- better in terms of pack size because the pack heuristics is back to
  2.17.0 time, we do not drop large deltas at all

If we encounter any delta (on-disk or created during try_delta phase)
that is larger than the 1MB limit, we stop using delta_size_ field for
this because it can't contain such size anyway. A new array of delta
size is dynamically allocated and can hold all the deltas that 2.17.0
can. This array only contains delta sizes that delta_size_ can't
contain.

With this, we do not have to drop deltas in try_delta() anymore. Of
course the downside is we use slightly more memory, even compared to
2.17.0. But since this is considered an uncommon case, a bit more
memory consumption should not be a problem.

Delta size limit is also raised from 1MB to 16MB to better cover
common case and avoid that extra memory consumption (99.999% deltas in
this reported repo are under 12MB; Jeff noted binary artifacts topped
out at about 3MB in some other private repos). Other fields are
shuffled around to keep this struct packed tight. We don't use more
memory in common case even with this limit update.

A note about thread synchronization. Since this code can be run in
parallel during delta searching phase, we need a mutex. The realloc
part in packlist_alloc() is not protected because it only happens
during the object counting phase, which is always single-threaded.

Access to e->delta_size_ (and by extension
pack->delta_size[e - pack->objects]) is unprotected as before, the
thread scheduler in pack-objects must make sure "e" is never updated
by two different threads.

The area under the new lock is as small as possible, avoiding locking
at all in common case, since lock contention with high thread count
could be expensive (most blobs are small enough that delta compute
time is short and we end up taking the lock very often). The previous
attempt to always hold a lock in oe_delta_size() and
oe_set_delta_size() increases execution time by 33% when repacking
linux.git with with 40 threads.

Reported-by: Elijah Newren <newren@gmail.com>
Helped-by: Elijah Newren <newren@gmail.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-07-23 10:21:29 -07:00
Junio C Hamano
50f08db594 Merge branch 'js/use-bug-macro'
Developer support update, by using BUG() macro instead of die() to
mark codepaths that should not happen more clearly.

* js/use-bug-macro:
  BUG_exit_code: fix sparse "symbol not declared" warning
  Convert remaining die*(BUG) messages
  Replace all die("BUG: ...") calls by BUG() ones
  run-command: use BUG() to report bugs, not die()
  test-tool: help verifying BUG() code paths
2018-05-30 14:04:07 +09:00
Johannes Schindelin
033abf97fc Replace all die("BUG: ...") calls by BUG() ones
In d8193743e08 (usage.c: add BUG() function, 2017-05-12), a new macro
was introduced to use for reporting bugs instead of die(). It was then
subsequently used to convert one single caller in 588a538ae55
(setup_git_env: convert die("BUG") to BUG(), 2017-05-12).

The cover letter of the patch series containing this patch
(cf 20170513032414.mfrwabt4hovujde2@sigill.intra.peff.net) is not
terribly clear why only one call site was converted, or what the plan
is for other, similar calls to die() to report bugs.

Let's just convert all remaining ones in one fell swoop.

This trick was performed by this invocation:

	sed -i 's/die("BUG: /BUG("/g' $(git grep -l 'die("BUG' \*.c)

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-06 19:06:13 +09:00
Nguyễn Thái Ngọc Duy
ac77d0c370 pack-objects: shrink size field in struct object_entry
It's very very rare that an uncompressed object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.

Shrink size field down to 31 bits and one overflow bit. If the size is
too large, we read it back from disk. As noted in the previous patch,
we need to return the delta size instead of canonical size when the
to-be-reused object entry type is a delta instead of a canonical one.

Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).

Another note about oe_get_size_slow(). This function MUST be thread
safe because SIZE() macro is used inside try_delta() which may run in
parallel. Outside parallel code, no-contention locking should be dirt
cheap (or insignificant compared to i/o access anyway). To exercise
this code, it's best to run the test suite with something like

    make test GIT_TEST_OE_SIZE=4

which forces this code on all objects larger than 3 bytes.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:59 +09:00
Nguyễn Thái Ngọc Duy
43fa44fa3b pack-objects: move in_pack out of struct object_entry
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.

This limits the number of packs we can handle to 1k. Since we can't be
sure people can never run into the situation where they have more than
1k pack files. Provide a fall back route for it.

If we find out they have too many packs, the new in_pack_by_idx[]
array (which has at most 1k elements) will not be used. Instead we
allocate in_pack[] array that holds nr_objects elements. This is
similar to how the optional in_pack_pos field is handled.

The new simple test is just to make sure the too-many-packs code path
is at least executed. The true test is running

    make test GIT_TEST_FULL_IN_PACK_ARRAY=1

to take advantage of other special case tests.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 12:38:58 +09:00
brian m. carlson
e6a492b7be pack: convert struct pack_idx_entry to struct object_id
Convert struct pack_idx_entry to use struct object_id by changing the
definition and applying the following semantic patch, plus the standard
object_id transforms:

@@
struct pack_idx_entry E1;
@@
- E1.sha1
+ E1.oid.hash

@@
struct pack_idx_entry *E1;
@@
- E1->sha1
+ E1->oid.hash

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-05-08 15:12:57 +09:00
René Scharfe
2756ca4347 use REALLOC_ARRAY for changing the allocation size of arrays
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-09-18 09:13:42 -07:00
Karsten Blees
039dc71a7c hashmap: factor out getting a hash code from a SHA1
Copying the first bytes of a SHA1 is duplicated in six places,
however, the implications (the actual value would depend on the
endianness of the platform) is documented only once.

Add a properly documented API for this.

Signed-off-by: Karsten Blees <blees@dcon.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-07-07 13:56:24 -07:00
René Scharfe
fb79947487 pack-objects: use free()+xcalloc() instead of xrealloc()+memset()
Whenever the hash table becomes too small then its size is increased,
the original part (and the added space) is zerod out using memset(),
and the table is rebuilt from scratch.

Simplify this proceess by returning the old memory using free() and
allocating the new buffer using xcalloc(), which already clears the
buffer for us.  That way we avoid copying the old hash table contents
needlessly inside xrealloc().

While at it, use the first array member with sizeof instead of a
specific type.  The old code used uint32_t and int, while index is
actually an array of int32_t.  Their sizes are the same basically
everywhere, so it's not actually a problem, but the new code is
cleaner and doesn't have to be touched should the type be changed.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-06-02 13:51:22 -07:00
Vicent Marti
2834bc27c1 pack-objects: refactor the packing list
The hash table that stores the packing list for a given `pack-objects`
run was tightly coupled to the pack-objects code.

In this commit, we refactor the hash table and the underlying storage
array into a `packing_data` struct. The functionality for accessing and
adding entries to the packing list is hence accessible from other parts
of Git besides the `pack-objects` builtin.

This refactoring is a requirement for further patches in this series
that will require accessing the commit packing list from outside of
`pack-objects`.

The hash table implementation has been minimally altered: we now
use table sizes which are always a power of two, to ensure a uniform
index distribution in the array.

Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-24 15:44:48 -07:00
Matthias Kestenholz
5d4a600335 Make git-pack-objects a builtin
Signed-off-by: Matthias Kestenholz <matthias@spinlock.ch>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-08-03 23:15:11 -07:00
Jeff King
4812a93a8c pack-objects: check pack.window for default window size
For some repositories, deltas simply don't make sense. One can disable
them for git-repack by adding --window, but git-push insists on making
the deltas which can be very CPU-intensive for little benefit.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-23 23:40:35 -07:00
Pavel Roskin
82e5a82fd7 Fix more typos, primarily in the code
The only visible change is that git-blame doesn't understand
"--compability" anymore, but it does accept "--compatibility" instead,
which is already documented.

Signed-off-by: Pavel Roskin <proski@gnu.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-07-10 00:36:44 -07:00
Nicolas Pitre
560b25a86f don't load objects needlessly when repacking
If no delta is attempted on some objects then it is useless to load them
in memory, neither create any delta index for them.  The best thing to
do is therefore to load and index them only when really needed.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-30 20:14:47 -07:00
Nicolas Pitre
8dbbd14ea3 consider previous pack undeltified object state only when reusing delta data
Without this there would never be a chance to improve packing for
previously undeltified objects.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-29 23:48:29 -07:00
Linus Torvalds
51d1e83f91 Do not try futile object pairs when repacking.
In the repacking window, if both objects we are looking at already came
from the same (old) pack-file, don't bother delta'ing them against each
other.

That means that we'll still always check for better deltas for (and
against!) _unpacked_ objects, but assuming incremental repacks, you'll
avoid the delta creation 99% of the time.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-29 15:24:29 -07:00
Junio C Hamano
9d24ed4f01 Merge branch 'ff/c99' into next
* ff/c99:
  Remove all void-pointer arithmetic.
2006-06-21 03:51:59 -07:00
Junio C Hamano
363b7817e0 upload-pack: prepare for sideband message support.
This does not implement sideband for propagating the status to
the downloader yet, but add code to capture the standard error
output from the pack-objects process in preparation for sending
it off to the client when the protocol extension allows us to do
so.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-21 02:34:14 -07:00
Florian Forster
1d7f171c3a Remove all void-pointer arithmetic.
ANSI C99 doesn't allow void-pointer arithmetic. This patch fixes this in
various ways. Usually the strategy that required the least changes was used.

Signed-off-by: Florian Forster <octo@verplant.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-20 01:59:46 -07:00
Linus Torvalds
ce0bd64299 pack-objects: improve path grouping heuristics.
This trivial patch not only simplifies the name hashing, it actually
improves packing for both git and the kernel.

The git archive pack shrinks from 6824090->6622627 bytes (a 3%
improvement), and the kernel pack shrinks from 108756213 to 108219021 (a
mere 0.5% improvement, but still, it's an improvement from making the
hashing much simpler!)

We just create a 32-bit hash, where we "age" previous characters by two
bits, so the last characters in a filename count most. So when we then
compare the hashes in the sort routine, filenames that end the same way
sort the same way.

It takes the subdirectory into account (unless the filename is > 16
characters), but files with the same name within the same subdirectory
will obviously sort closer than files in different subdirectories.

And, incidentally (which is why I tried the hash change in the first
place, of course) builtin-rev-list.c will sort fairly close to rev-list.c.

And no, it's not a "good hash" in the sense of being secure or unique, but
that's not what we're looking for. The whole "hash" thing is misnamed
here. It's not so much a hash as a "sorting number".

[jc: rolled in simplification for computing the sorting number
 computation for thin pack base objects]

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-06-05 17:23:31 -07:00
Linus Torvalds
4c068a9831 tree_entry(): new tree-walking helper function
This adds a "tree_entry()" function that combines the common operation of
doing a "tree_entry_extract()" + "update_tree_entry()".

It also has a simplified calling convention, designed for simple loops
that traverse over a whole tree: the arguments are pointers to the tree
descriptor and a name_entry structure to fill in, and it returns a boolean
"true" if there was an entry left to be gotten in the tree.

This allows tree traversal with

	struct tree_desc desc;
	struct name_entry entry;

	desc.buf = tree->buffer;
	desc.size = tree->size;
	while (tree_entry(&desc, &entry) {
		... use "entry.{path, sha1, mode, pathlen}" ...
	}

which is not only shorter than writing it out in full, it's hopefully less
error prone too.

[ It's actually a tad faster too - we don't need to recalculate the entry
  pathlength in both extract and update, but need to do it only once.
  Also, some callers can avoid doing a "strlen()" on the result, since
  it's returned as part of the name_entry structure.

  However, by now we're talking just 1% speedup on "git-rev-list --objects
  --all", and we're definitely at the point where tree walking is no
  longer the issue any more. ]

NOTE! Not everybody wants to use this new helper function, since some of
the tree walkers very much on purpose do the descriptor update separately
from the entry extraction. So the "extract + update" sequence still
remains as the core sequence, this is just a simplified interface.

We should probably add a silly two-line inline helper function for
initializing the descriptor from the "struct tree" too, just to cut down
on the noise from that common "desc" initializer.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-30 23:03:01 -07:00
Junio C Hamano
0fa6417c49 Merge branch 'np/pack'
* np/pack:
  improve depth heuristic for maximum delta size
  pack-object: slightly more efficient
  simple euristic for further free packing improvements
2006-05-16 17:20:24 -07:00
Nicolas Pitre
c3b06a69ff improve depth heuristic for maximum delta size
This provides a linear decrement on the penalty related to delta depth
instead of being an 1/x function.  With this another 5% reduction is
observed on packs for both the GIT repo and the Linux kernel repo, as
well as fixing a pack size regression in another sample repo I have.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-16 13:35:46 -07:00
Junio C Hamano
d55aaefa3e Merge branch 'fix'
* fix:
  Fix pack-index issue on 64-bit platforms a bit more portably.
  Install git-send-email by default
  Fix compilation on newer NetBSD systems
  git config syntax updates
  Another config file parsing fix.
  checkout: use --aggressive when running a 3-way merge (-m).
2006-05-15 13:48:22 -07:00
Junio C Hamano
1b9bc5a7b7 Fix pack-index issue on 64-bit platforms a bit more portably.
Apparently <stdint.h> is not enough for uint32_t on OpenBSD; use
"unsigned int" -- hopefully that would stay 32-bit on every
platform we care about, at least until we update the pack-index
file format.

Our sha1 routines optimized for architectures use uint32_t and
expects '#include <stdint.h>' to be enough, so OpenBSD on arm or
ppc might have similar issues down the road, I dunno.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15 13:01:37 -07:00
Nicolas Pitre
ff45715ce5 pack-object: slightly more efficient
Avoid creating a delta index for objects with maximum depth since they
are not going to be used as delta base anyway.  This also reduce peak
memory usage slightly as the current object's delta index is not useful
until the next object in the loop is considered for deltification. This
saves a bit more than 1% on CPU usage.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15 12:32:13 -07:00
Nicolas Pitre
4e8da19581 simple euristic for further free packing improvements
Given that the early eviction of objects with maximum delta depth
may exhibit bad packing on its own, why not considering a bias against
deep base objects in try_delta() to mitigate that bad behavior.

This patch adjust the MAX_size allowed for a delta based on the depth of
the base object as well as enabling the early eviction of max depth
objects from the object window.  When used separately, those two things
produce slightly better and much worse results respectively.  But their
combined effect is a surprising significant packing improvement.

With this really simple patch the GIT repo gets nearly 15% smaller, and
the Linux kernel repo about 5% smaller, with no significantly measurable
CPU usage difference.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-15 12:31:21 -07:00
Junio C Hamano
975bf9cf5a Merge branch 'fix'
* fix:
  include header to define uint32_t, necessary on Mac OS X
2006-05-14 16:20:15 -07:00
Ben Clifford
d9635e9c53 include header to define uint32_t, necessary on Mac OS X
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-14 16:19:52 -07:00
Junio C Hamano
3a3e89b897 Merge branch 'fix'
* fix:
  Fix git-pack-objects for 64-bit platforms
2006-05-13 22:24:18 -07:00
Dennis Stosberg
66561f5a77 Fix git-pack-objects for 64-bit platforms
The offset of an object in the pack is recorded as a 4-byte integer
in the index file.  When reading the offset from the mmap'ed index
in prepare_pack_revindex(), the address is dereferenced as a long*.
This works fine as long as the long type is four bytes wide.  On
NetBSD/sparc64, however, a long is 8 bytes wide and so dereferencing
the offset produces garbage.

[jc: taking suggestion by Linus to use uint32_t]

Signed-off-by: Dennis Stosberg <dennis@stosberg.net>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-13 10:43:16 -07:00
Junio C Hamano
4edd44725c Merge branch 'np/delta'
* np/delta:
  improve diff-delta with sparse and/or repetitive data
  tiny optimization to diff-delta
  replace adler32 with Rabin's polynomial in diff-delta
  use delta index data when finding best delta matches
  split the diff-delta interface
2006-05-09 16:40:28 -07:00
Junio C Hamano
86118bcb46 pack-object: squelch eye-candy on non-tty
One of my post-update scripts runs a git-fetch into a separate
repository and sends the results back to me (2>&1); I end up
getting this in the mail:

    Generating pack...
    Done counting 180 objects.
    Result has 131 objects.
    Deltifying 131 objects.
       0% (0/131) done^M   1% (2/131) done^M...

This defaults not to do the progress report when not on a tty.

You could give --progress to force the progress report, but
let's not bother even documenting it nor mentioning it in the
usage string.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-05-05 15:24:12 -07:00
Junio C Hamano
9a8b6a0a9d pack-objects: update size heuristucs.
We used to omit delta base candidates that is much bigger than
the target, but delta size does not grow when we delete more, so
that was not a very good heuristics.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-27 19:31:46 -07:00
Nicolas Pitre
f6c7081aa9 use delta index data when finding best delta matches
This patch allows for computing the delta index for each base object
only once and reuse it when trying to find the best delta match.

This should set the mark and pave the way for possibly better delta
generator algorithms.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-26 21:23:03 -07:00
Nicolas Pitre
0dec30b978 fix pack-object buffer size
The input line has 40 _chars_ of sha1 and no 20 _bytes_. It should also
account for the space before the pathname, and the terminating \n and \0.

Signed-off-by: Nicolas Pitre <nico@cam.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-21 00:45:10 -07:00
Junio C Hamano
f527cb8c38 pack-objects: do not stop at object that is "too small"
Because we sort the delta window by name-hash and then size,
hitting an object that is too small to consider as a delta base
for the current object does not mean we do not have better
candidate in the window beyond it.

Noticed by Shawn Pearce, analyzed by Nico, Linus and me.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-20 23:36:22 -07:00
Junio C Hamano
5379a5c5ee Thin pack generation: optimization.
Jens Axboe noticed that recent "git push" has become very slow
since we made --thin transfer the default.

Thin pack generation to push a handful revisions that touch
relatively small number of paths out of huge tree was stupid; it
registered _everything_ from the excluded revisions.  As a
result, "Counting objects" phase was unnecessarily expensive.

This changes the logic to register the blobs and trees from
excluded revisions only for paths we are actually going to send
to the other end.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-07 02:08:38 -07:00
Junio C Hamano
810e152375 Merge branch 'pe/cleanup'
* pe/cleanup:
  Replace xmalloc+memset(0) with xcalloc.
  Use blob_, commit_, tag_, and tree_type throughout.
2006-04-04 13:43:00 -07:00
Junio C Hamano
4c61b7d15a Merge branch 'lt/fix-sol-pack'
* lt/fix-sol-pack:
  Use sigaction and SA_RESTART in read-tree.c; add option in Makefile.
  safe_fgets() - even more anal fgets()
  pack-objects: be incredibly anal about stdio semantics
  Fix Solaris stdio signal handling stupidities
2006-04-04 13:42:02 -07:00
Peter Eriksen
8e44025925 Use blob_, commit_, tag_, and tree_type throughout.
This replaces occurences of "blob", "commit", "tag", and "tree",
where they're really used as type specifiers, which we already
have defined global constants for.

Signed-off-by: Peter Eriksen <s022018@student.dtu.dk>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-04 00:11:19 -07:00
Junio C Hamano
687dd75c95 safe_fgets() - even more anal fgets()
This is from Linus -- the previous round forgot to clear error
after EINTR case.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-03 23:42:25 -07:00
Linus Torvalds
da93d12b00 pack-objects: be incredibly anal about stdio semantics
This is the "letter of the law" version of using fgets() properly in the
face of incredibly broken stdio implementations.  We can work around the
Solaris breakage with SA_RESTART, but in case anybody else is ever that
stupid, here's the "safe" (read: "insanely anal") way to use fgets.

It probably goes without saying that I'm not terribly impressed by
Solaris libc.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-02 13:46:27 -07:00
Linus Torvalds
fb7a6531e6 Fix Solaris stdio signal handling stupidities
This uses sigaction() to install the SIGALRM handler with SA_RESTART, so
that Solaris stdio doesn't break completely when a signal interrupts a
read.

Thanks to Jason Riedy for confirming the silly Solaris signal behaviour.

Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-04-02 13:41:56 -07:00
Junio C Hamano
1b0c7174a1 tree/diff header cleanup.
Introduce tree-walk.[ch] and move "struct tree_desc" and
associated functions from various places.

Rename DIFF_FILE_CANON_MODE(mode) macro to canon_mode(mode) and
move it to cache.h.  This macro returns the canonicalized
st_mode value in the host byte order for files, symlinks and
directories -- to be compared with a tree_desc entry.
create_ce_mode(mode) in cache.h is similar but is intended to be
used for index entries (so it does not work for directories) and
returns the value in the network byte order.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-29 23:54:13 -08:00
Junio C Hamano
70ca1a3f85 pack-objects: simplify "thin" pack.
There was a misguided logic to overly prefer using objects that
we are not going to pack as the base object.  This was
unnecessary.  It does not matter to the unpacking side where the
base object is -- it matters more to make the resulting delta
smaller.

Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-06 03:03:56 -08:00
Luck, Tony
2b74cffa91 Re-fix compilation warnings.
Commit 8fcf1ad9c68e15d881194c8544e7c11d33529c2b has a
combination of double cast and Andreas' switch to using
unsigned long ... just the latter is sufficient (and a lot less
ugly than using the double cast).

Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Junio C Hamano <junkio@cox.net>
2006-03-01 17:42:00 -08:00