1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-25 11:16:10 +02:00
Commit Graph

734 Commits

Author SHA1 Message Date
Ville Skyttä 2e3a16b279 Spelling fixes
<BAD>                     <CORRECTED>
    accidently                accidentally
    commited                  committed
    dependancy                dependency
    emtpy                     empty
    existance                 existence
    explicitely               explicitly
    git-upload-achive         git-upload-archive
    hierachy                  hierarchy
    indegee                   indegree
    intial                    initial
    mulitple                  multiple
    non-existant              non-existent
    precendence.              precedence.
    priviledged               privileged
    programatically           programmatically
    psuedo-binary             pseudo-binary
    soemwhere                 somewhere
    successfull               successful
    transfering               transferring
    uncommited                uncommitted
    unkown                    unknown
    usefull                   useful
    writting                  writing

Signed-off-by: Ville Skyttä <ville.skytta@iki.fi>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 14:35:42 -07:00
Junio C Hamano 78849622ec Merge branch 'jk/pack-objects-optim'
"git pack-objects" has a few options that tell it not to pack
objects found in certain packfiles, which require it to scan .idx
files of all available packs.  The codepaths involved in these
operations have been optimized for a common case of not having any
non-local pack and/or any .kept pack.

* jk/pack-objects-optim:
  pack-objects: compute local/ignore_pack_keep early
  pack-objects: break out of want_object loop early
  find_pack_entry: replace last_found_pack with MRU cache
  add generic most-recently-used list
  sha1_file: drop free_pack_by_name
  t/perf: add tests for many-pack scenarios
2016-08-08 14:48:39 -07:00
Jeff King a73cdd21c4 find_pack_entry: replace last_found_pack with MRU cache
Each pack has an index for looking up entries in O(log n)
time, but if we have multiple packs, we have to scan through
them linearly. This can produce a measurable overhead for
some operations.

We dealt with this long ago in f7c22cc (always start looking
up objects in the last used pack first, 2007-05-30), which
keeps what is essentially a 1-element most-recently-used
cache. In theory, we should be able to do better by keeping
a similar but longer cache, that is the same length as the
pack-list itself.

Since we now have a convenient generic MRU structure, we can
plug it in and measure. Here are the numbers for running
p5303 against linux.git:

Test                      HEAD^                HEAD
------------------------------------------------------------------------
5303.3: rev-list (1)      31.56(31.28+0.27)    31.30(31.08+0.20) -0.8%
5303.4: repack (1)        40.62(39.35+2.36)    40.60(39.27+2.44) -0.0%
5303.6: rev-list (50)     31.31(31.06+0.23)    31.23(31.00+0.22) -0.3%
5303.7: repack (50)       58.65(69.12+1.94)    58.27(68.64+2.05) -0.6%
5303.9: rev-list (1000)   38.74(38.40+0.33)    31.87(31.62+0.24) -17.7%
5303.10: repack (1000)    367.20(441.80+4.62)  342.00(414.04+3.72) -6.9%

The main numbers of interest here are the rev-list ones
(since that is exercising the normal object lookup code
path).  The single-pack case shouldn't improve at all; the
260ms speedup there is just part of the run-to-run noise
(but it's important to note that we didn't make anything
worse with the overhead of maintaining our cache). In the
50-pack case, we see similar results. There may be a slight
improvement, but it's mostly within the noise.

The 1000-pack case does show a big improvement, though. That
carries over to the repack case, as well. Even though we
haven't touched its pack-search loop yet, it does still do a
lot of normal object lookups (e.g., for the internal
revision walk), and so improves.

As a point of reference, I also ran the 1000-pack test
against a version of HEAD^ with the last_found_pack
optimization disabled. It takes ~60s, so that gives an
indication of how much even the single-element cache is
helping.

For comparison, here's a smaller repository, git.git:

Test                      HEAD^               HEAD
---------------------------------------------------------------------
5303.3: rev-list (1)      1.56(1.54+0.01)    1.54(1.51+0.02) -1.3%
5303.4: repack (1)        1.84(1.80+0.10)    1.82(1.80+0.09) -1.1%
5303.6: rev-list (50)     1.58(1.55+0.02)    1.59(1.57+0.01) +0.6%
5303.7: repack (50)       2.50(3.18+0.04)    2.50(3.14+0.04) +0.0%
5303.9: rev-list (1000)   2.76(2.71+0.04)    2.24(2.21+0.02) -18.8%
5303.10: repack (1000)    13.21(19.56+0.25)  11.66(18.01+0.21) -11.7%

You can see that the percentage improvement is similar.
That's because the lookup we are optimizing is roughly
O(nr_objects * nr_packs). Since the number of packs is
constant in both tests, we'd expect the improvement to be
linear in the number of objects. But the whole process is
also linear in the number of objects, so the improvement
is a constant factor.

The exact improvement does also depend on the contents of
the packs. In p5303, the extra packs all have 5 first-parent
commits in them, which is a reasonable simulation of a
pushed-to repository. But it also means that only 250
first-parent commits are in those packs (compared to almost
50,000 total in linux.git), and the rest are in the huge
"base" pack. So once we start looking at history in taht big
pack, that's where we'll find most everything, and even the
1-element cache gets close to 100% cache hits.  You could
almost certainly show better numbers with a more
pathological case (e.g., distributing the objects more
evenly across the packs). But that's simply not that
realistic a scenario, so it makes more sense to focus on
these numbers.

The implementation itself is a straightforward application
of the MRU code. We provide an MRU-ordered list of packs
that shadows the packed_git list. This is easy to do because
we only create and revise the pack list in one place. The
"reprepare" code path actually drops the whole MRU and
replaces it for simplicity. It would be more efficient to
just add new entries, but there's not much point in
optimizing here; repreparing happens rarely, and only after
doing a lot of other expensive work.  The key things to keep
optimized are traversal (which is just a normal linked list,
albeit with one extra level of indirection over the regular
packed_git list), and marking (which is a constant number of
pointer assignments, though slightly more than the old
last_found_pack was; it doesn't seem to create a measurable
slowdown, though).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 11:05:07 -07:00
Jeff King 3157c880f6 sha1_file: drop free_pack_by_name
The point of this function is to drop an entry from the
"packed_git" cache that points to a file we might be
overwriting, because our contents may not be the same (and
hence the only caller was pack-objects as it moved a
temporary packfile into place).

In older versions of git, this could happen because the
names of packfiles were derived from the set of objects they
contained, not the actual bits on disk. But since 1190a1a
(pack-objects: name pack files after trailer hash,
2013-12-05), the name reflects the actual bits on disk, and
any two packfiles with the same name can be used
interchangeably.

Dropping this function not only saves a few lines of code,
it makes the lifetime of "struct packed_git" much easier to
reason about: namely, we now do not ever free these structs.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-29 11:05:06 -07:00
Junio C Hamano ad2d777604 Merge branch 'nd/pack-ofs-4gb-limit'
"git pack-objects" and "git index-pack" mostly operate with off_t
when talking about the offset of objects in a packfile, but there
were a handful of places that used "unsigned long" to hold that
value, leading to an unintended truncation.

* nd/pack-ofs-4gb-limit:
  fsck: use streaming interface for large blobs in pack
  pack-objects: do not truncate result in-pack object size on 32-bit systems
  index-pack: correct "offset" type in unpack_entry_data()
  index-pack: report correct bad object offsets even if they are large
  index-pack: correct "len" type in unpack_data()
  sha1_file.c: use type off_t* for object_info->disk_sizep
  pack-objects: pass length to check_pack_crc() without truncation
2016-07-28 10:34:42 -07:00
Nguyễn Thái Ngọc Duy 211c61c6cf pack-objects: pass length to check_pack_crc() without truncation
On 32 bit systems with large file support, unsigned long is 32-bit
while the two offsets in the subtraction expression (pack-objects has
the exact same expression as in sha1_file.c but not shown in diff) are
in 64-bit. If an in-pack object is larger than 2^32 len/datalen is
truncated and we get a misleading "error: bad packed object CRC for
..." as a result.

Use off_t for len and datalen. check_pack_crc() already accepts this
argument as off_t and can deal with 4+ GB.

Noticed-by: Christoph Michelbach <michelbach94@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-07-12 10:14:29 -07:00
Junio C Hamano 352d72a30e Merge branch 'nd/worktree-various-heads'
The experimental "multiple worktree" feature gains more safety to
forbid operations on a branch that is checked out or being actively
worked on elsewhere, by noticing that e.g. it is being rebased.

* nd/worktree-various-heads:
  branch: do not rename a branch under bisect or rebase
  worktree.c: check whether branch is bisected in another worktree
  wt-status.c: split bisect detection out of wt_status_get_state()
  worktree.c: check whether branch is rebased in another worktree
  worktree.c: avoid referencing to worktrees[i] multiple times
  wt-status.c: make wt_status_check_rebase() work on any worktree
  wt-status.c: split rebase detection out of wt_status_get_state()
  path.c: refactor and add worktree_git_path()
  worktree.c: mark current worktree
  worktree.c: make find_shared_symref() return struct worktree *
  worktree.c: store "id" instead of "git_dir"
  path.c: add git_common_path() and strbuf_git_common_path()
  dir.c: rename str(n)cmp_icase to fspath(n)cmp
2016-05-23 14:54:29 -07:00
Nguyễn Thái Ngọc Duy 7616c6ca9d sha1_file.c: use {error,die,warning}_errno()
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-05-09 12:29:08 -07:00
Nguyễn Thái Ngọc Duy ba0897e6ae dir.c: rename str(n)cmp_icase to fspath(n)cmp
These functions compare two paths that are taken from file system.
Depending on the running file system, paths may need to be compared
case-sensitively or not, and maybe even something else in future. The
current names do not convey that well.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-04-22 14:09:37 -07:00
Junio C Hamano 090de6b289 Merge branch 'jk/pack-idx-corruption-safety'
The code to read the pack data using the offsets stored in the pack
idx file has been made more carefully check the validity of the
data in the idx.

* jk/pack-idx-corruption-safety:
  sha1_file.c: mark strings for translation
  use_pack: handle signed off_t overflow
  nth_packed_object_offset: bounds-check extended offset
  t5313: test bounds-checks of corrupted/malicious pack/idx files
2016-03-04 13:45:47 -08:00
Nguyễn Thái Ngọc Duy 7465feba51 sha1_file.c: mark strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-27 09:54:57 -08:00
Junio C Hamano 11529ecec9 Merge branch 'jk/tighten-alloc'
Update various codepaths to avoid manually-counted malloc().

* jk/tighten-alloc: (22 commits)
  ewah: convert to REALLOC_ARRAY, etc
  convert ewah/bitmap code to use xmalloc
  diff_populate_gitlink: use a strbuf
  transport_anonymize_url: use xstrfmt
  git-compat-util: drop mempcpy compat code
  sequencer: simplify memory allocation of get_message
  test-path-utils: fix normalize_path_copy output buffer size
  fetch-pack: simplify add_sought_entry
  fast-import: simplify allocation in start_packfile
  write_untracked_extension: use FLEX_ALLOC helper
  prepare_{git,shell}_cmd: use argv_array
  use st_add and st_mult for allocation size computation
  convert trivial cases to FLEX_ARRAY macros
  use xmallocz to avoid size arithmetic
  convert trivial cases to ALLOC_ARRAY
  convert manual allocations to argv_array
  argv-array: add detach function
  add helpers for allocating flex-array structs
  harden REALLOC_ARRAY and xcalloc against size_t overflow
  tree-diff: catch integer overflow in combine_diff_path allocation
  ...
2016-02-26 13:37:16 -08:00
Jeff King 13e0b0d3dc use_pack: handle signed off_t overflow
A v2 pack index file can specify an offset within a packfile
of up to 2^64-1 bytes. On a system with a signed 64-bit
off_t, we can represent only up to 2^63-1. This means that a
corrupted .idx file can end up with a negative offset in the
pack code. Our bounds-checking use_pack function looks for
too-large offsets, but not for ones that have wrapped around
to negative. Let's do so, which fixes an out-of-bounds
access demonstrated in t5313.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-25 11:32:46 -08:00
Jeff King 47fe3f6ef0 nth_packed_object_offset: bounds-check extended offset
If a pack .idx file has a corrupted offset for an object, we
may try to access an offset in the .idx or .pack file that
is larger than the file's size.  For the .pack case, we have
use_pack() to protect us, which realizes the access is out
of bounds. But if the corrupted value asks us to look in the
.idx file's secondary 64-bit offset table, we blindly add it
to the mmap'd index data and access arbitrary memory.

We can fix this with a simple bounds-check compared to the
size we found when we opened the .idx file.

Note that there's similar code in index-pack that is
triggered only during "index-pack --verify". To support
both, we pull the bounds-check into a separate function,
which dies when it sees a corrupted file.

It would be nice if we could return an error, so that the
pack code could try to find a good copy of the object
elsewhere. Currently nth_packed_object_offset doesn't have
any way to return an error, but it could probably use "0" as
a sentinel value (since no object can start there). This is
the minimal fix, and we can improve the resilience later on
top.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-25 11:32:43 -08:00
Jeff King 50a6c8efa2 use st_add and st_mult for allocation size computation
If our size computation overflows size_t, we may allocate a
much smaller buffer than we expected and overflow it. It's
probably impossible to trigger an overflow in most of these
sites in practice, but it is easy enough convert their
additions and multiplications into overflow-checking
variants. This may be fixing real bugs, and it makes
auditing the code easier.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Jeff King b32fa95fd8 convert trivial cases to ALLOC_ARRAY
Each of these cases can be converted to use ALLOC_ARRAY or
REALLOC_ARRAY, which has two advantages:

  1. It automatically checks the array-size multiplication
     for overflow.

  2. It always uses sizeof(*array) for the element-size,
     so that it can never go out of sync with the declared
     type of the array.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-02-22 14:51:09 -08:00
Junio C Hamano 3f16396228 clone/sha1_file: read info/alternates with strbuf_getline()
$GIT_OBJECT_DIRECTORY/info/alternates is a text file that can be
edited with a DOS editor.  We do not want to use the real path with
CR appended at the end.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-15 10:34:53 -08:00
Junio C Hamano 8f309aeb82 strbuf: introduce strbuf_getline_{lf,nul}()
The strbuf_getline() interface allows a byte other than LF or NUL as
the line terminator, but this is only because I wrote these
codepaths anticipating that there might be a value other than NUL
and LF that could be useful when I introduced line_termination long
time ago.  No useful caller that uses other value has emerged.

By now, it is clear that the interface is overly broad without a
good reason.  Many codepaths have hardcoded preference to read
either LF terminated or NUL terminated records from their input, and
then call strbuf_getline() with LF or NUL as the third parameter.

This step introduces two thin wrappers around strbuf_getline(),
namely, strbuf_getline_lf() and strbuf_getline_nul(), and
mechanically rewrites these call sites to call either one of
them.  The changes contained in this patch are:

 * introduction of these two functions in strbuf.[ch]

 * mechanical conversion of all callers to strbuf_getline() with
   either '\n' or '\0' as the third parameter to instead call the
   respective thin wrapper.

After this step, output from "git grep 'strbuf_getline('" would
become a lot smaller.  An interim goal of this series is to make
this an empty set, so that we can have strbuf_getline_crlf() take
over the shorter name strbuf_getline().

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-01-15 10:12:51 -08:00
Junio C Hamano fbe959dde7 Merge branch 'bc/format-patch-null-from-line'
"format-patch" has learned a new option to zero-out the commit
object name on the mbox "From " line.

* bc/format-patch-null-from-line:
  format-patch: check that header line has expected format
  format-patch: add an option to suppress commit hash
  sha1_file.c: introduce a null_oid constant
2015-12-21 10:59:08 -08:00
Junio C Hamano 897b18508b Merge branch 'jk/prune-mtime'
The helper used to iterate over loose object directories to prune
stale objects did not closedir() immediately when it is done with a
directory--a callback such as the one used for "git prune" may want
to do rmdir(), but it would fail on open directory on platforms
such as WinXP.

* jk/prune-mtime:
  prune: close directory earlier during loose-object directory traversal
2015-12-15 08:02:16 -08:00
brian m. carlson 3e56e7245c sha1_file.c: introduce a null_oid constant
null_oid is the struct object_id equivalent to null_sha1.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-12-14 13:35:54 -08:00
brian m. carlson b419aa25d5 sha1_file: introduce has_object_file helper.
Add has_object_file, which is a wrapper around has_sha1_file, but for
struct object_id.

Signed-off-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Jeff King <peff@peff.net>
2015-11-20 08:02:05 -05:00
Jeff King 45014beac0 Merge branch 'dk/gc-idx-wo-pack'
Having a leftover .idx file without corresponding .pack file in
the repository hurts performance; "git gc" learned to prune them.

* dk/gc-idx-wo-pack:
  gc: remove garbage .idx files from pack dir
  t5304: test cleaning pack garbage
  prepare_packed_git(): refactor garbage reporting in pack directory
2015-11-20 06:55:34 -05:00
Junio C Hamano 808d119263 Merge branch 'js/misc-fixes'
Various compilation fixes and squelching of warnings.

* js/misc-fixes:
  Correct fscanf formatting string for I64u values
  Silence GCC's "cast of pointer to integer of a different size" warning
  Squelch warning about an integer overflow
2015-10-30 13:07:00 -07:00
Johannes Schindelin 56a1a3ab44 Silence GCC's "cast of pointer to integer of a different size" warning
When calculating hashes from pointers, it actually makes sense to cut
off the most significant bits. In that case, said warning does not make
a whole lot of sense.

So let's just work around it by casting the pointer first to intptr_t
and then casting up/down to the final integral type.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-26 13:24:03 -07:00
Junio C Hamano 78891795df Merge branch 'jk/war-on-sprintf'
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.

Macintosh-specific breakage was noticed and corrected in this
reroll.

* jk/war-on-sprintf: (70 commits)
  name-rev: use strip_suffix to avoid magic numbers
  use strbuf_complete to conditionally append slash
  fsck: use for_each_loose_file_in_objdir
  Makefile: drop D_INO_IN_DIRENT build knob
  fsck: drop inode-sorting code
  convert strncpy to memcpy
  notes: document length of fanout path with a constant
  color: add color_set helper for copying raw colors
  prefer memcpy to strcpy
  help: clean up kfmclient munging
  receive-pack: simplify keep_arg computation
  avoid sprintf and strcpy with flex arrays
  use alloc_ref rather than hand-allocating "struct ref"
  color: add overflow checks for parsing colors
  drop strcpy in favor of raw sha1_to_hex
  use sha1_to_hex_r() instead of strcpy
  daemon: use cld->env_array when re-spawning
  stat_tracking_info: convert to argv_array
  http-push: use an argv_array for setup_revisions
  fetch-pack: use argv_array for index-pack / unpack-objects
  ...
2015-10-20 15:24:01 -07:00
Junio C Hamano db5adf24bf Merge branch 'js/clone-dissociate'
"git clone --dissociate" runs a big "git repack" process at the
end, and it helps to close file descriptors that are open on the
packs and their idx files before doing so on filesystems that
cannot remove a file that is still open.

* js/clone-dissociate:
  clone --dissociate: avoid locking pack files
  sha1_file.c: add a function to release all packs
  sha1_file: consolidate code to close a pack's file descriptor
  t5700: demonstrate a Windows file locking issue with `git clone --dissociate`
2015-10-15 15:43:49 -07:00
Johannes Schindelin 38849a8116 sha1_file.c: add a function to release all packs
On Windows, files that are in use cannot be removed or renamed. That
means that we have to release pack files when we are about to, say,
repack them. Let's introduce a convenient function to close all the
pack files and their idx files.

While at it, we consolidate the close windows/close fd/close index
stanza in `free_pack_by_name()` into the `close_pack()` function that
is used by the new `close_all_packs()` function to avoid repeated code.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-07 10:47:10 -07:00
Johannes Schindelin 71fe5d7fb0 sha1_file: consolidate code to close a pack's file descriptor
There was a lot of repeated code to close the file descriptor of
a given pack. Let's just refactor this code into a single function.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 14:43:58 -07:00
Jeff King c7ab0ba340 avoid sprintf and strcpy with flex arrays
When we are allocating a struct with a FLEX_ARRAY member, we
generally compute the size of the array and then sprintf or
strcpy into it. Normally we could improve a dynamic allocation
like this by using xstrfmt, but it doesn't work here; we
have to account for the size of the rest of the struct.

But we can improve things a bit by storing the length that
we use for the allocation, and then feeding it to xsnprintf
or memcpy, which makes it more obvious that we are not
writing more than the allocated number of bytes.

It would be nice if we had some kind of helper for
allocating generic flex arrays, but it doesn't work that
well:

 - the call signature is a little bit unwieldy:

      d = flex_struct(sizeof(*d), offsetof(d, path), fmt, ...);

   You need offsetof here instead of just writing to the
   end of the base size, because we don't know how the
   struct is packed (partially this is because FLEX_ARRAY
   might not be zero, though we can account for that; but
   the size of the struct may actually be rounded up for
   alignment, and we can't know that).

 - some sites do clever things, like over-allocating because
   they know they will write larger things into the buffer
   later (e.g., struct packed_git here).

So we're better off to just write out each allocation (or
add type-specific helpers, though many of these are one-off
allocations anyway).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Jeff King d4b3d11a03 write_loose_object: convert to strbuf
When creating a loose object tempfile, we use a fixed
PATH_MAX-sized buffer, and strcpy directly into it. This
isn't buggy, because we do a rough check of the size, but
there's no verification that our guesstimate of the required
space is enough (in fact, it's several bytes too big for the
current naming scheme).

Let's switch to a strbuf, which makes this much easier to
verify. The allocation overhead should be negligible, since
we are replacing a static buffer with a static strbuf, and
we'll only need to allocate on the first call.

While we're here, we can also document a subtle interaction
with mkstemp that would be easy to overlook.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-05 11:08:05 -07:00
Jeff King ac5190cc48 sha1_get_pack_name: use a strbuf
We do some manual memory computation here, and there's no
check that our 60 is not overflowed by the raw sprintf (it
isn't, because the "which" parameter is never longer than
"pack"). We can simplify this greatly with a strbuf.

Technically the end result is not identical, as the original
took care not to rewrite the object directory on each call
for performance reasons.  We could do that here, too (by
saving the baselen and resetting to it), but it's not worth
the complexity; this function is not called a lot (generally
once per packfile that we open).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Jeff King 9ae97018fb use strip_suffix and xstrfmt to replace suffix
When we want to convert "foo.pack" to "foo.idx", we do it by
duplicating the original string and then munging the bytes
in place. Let's use strip_suffix and xstrfmt instead, which
has several advantages:

  1. It's more clear what the intent is.

  2. It does not implicitly rely on the fact that
     strlen(".idx") <= strlen(".pack") to avoid an overflow.

  3. We communicate the assumption that the input file ends
     with ".pack" (and get a run-time check that this is so).

  4. We drop calls to strcpy, which makes auditing the code
     base easier.

Likewise, we can do this to convert ".pack" to ".bitmap",
avoiding some manual memory computation.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Jeff King 48bcc1c3cc add_packed_git: convert strcpy into xsnprintf
We have the path "foo.idx", and we create a buffer big
enough to hold "foo.pack" and "foo.keep", and then strcpy
straight into it. This isn't a bug (we have enough space),
but it's very hard to tell from the strcpy that this is so.

Let's instead use strip_suffix to take off the ".idx",
record the size of our allocation, and use xsnprintf to make
sure we don't violate our assumptions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Jeff King ef1286d3c0 use xsnprintf for generating git object headers
We generally use 32-byte buffers to format git's "type size"
header fields. These should not generally overflow unless
you can produce some truly gigantic objects (and our types
come from our internal array of constant strings). But it is
a good idea to use xsnprintf to make sure this is the case.

Note that we slightly modify the interface to
write_sha1_file_prepare, which nows uses "hdrlen" as an "in"
parameter as well as an "out" (on the way in it stores the
allocated size of the header, and on the way out it returns
the ultimate size of the header).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Junio C Hamano f0bc854623 Sync with 2.5.2 2015-09-09 14:30:35 -07:00
Junio C Hamano 3d3caf0b78 Sync with 2.4.9 2015-09-04 10:43:23 -07:00
Junio C Hamano ef0e938a1a Sync with 2.3.9 2015-09-04 10:34:19 -07:00
Junio C Hamano 8267cd11d6 Sync with 2.2.3 2015-09-04 10:29:28 -07:00
Jeff King 5015f01c12 read_info_alternates: handle paths larger than PATH_MAX
This function assumes that the relative_base path passed
into it is no larger than PATH_MAX, and writes into a
fixed-size buffer. However, this path may not have actually
come from the filesystem; for example, add_submodule_odb
generates a path using a strbuf and passes it in. This is
hard to trigger in practice, though, because the long
submodule directory would have to exist on disk before we
would try to open its info/alternates file.

We can easily avoid the bug, though, by simply creating the
filename on the heap.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-04 09:36:51 -07:00
Junio C Hamano cbcd3dcaa8 Merge branch 'cb/open-noatime-clear-errno' into maint
When trying to see that an object does not exist, a state errno
leaked from our "first try to open a packfile with O_NOATIME and
then if it fails retry without it" logic on a system that refuses
O_NOATIME.  This confused us and caused us to die, saying that the
packfile is unreadable, when we should have just reported that the
object does not exist in that packfile to the caller.

* cb/open-noatime-clear-errno:
  git_open_noatime: return with errno=0 on success
2015-09-03 19:17:49 -07:00
Junio C Hamano 9b8d731995 Merge branch 'cb/open-noatime-clear-errno'
When trying to see that an object does not exist, a state errno
leaked from our "first try to open a packfile with O_NOATIME and
then if it fails retry without it" logic on a system that refuses
O_NOATIME.  This confused us and caused us to die, saying that the
packfile is unreadable, when we should have just reported that the
object does not exist in that packfile to the caller.

* cb/open-noatime-clear-errno:
  git_open_noatime: return with errno=0 on success
2015-08-25 14:57:10 -07:00
Junio C Hamano 8c9155e031 Merge branch 'jk/git-path'
git_path() and mkpath() are handy helper functions but it is easy
to misuse, as the callers need to be careful to keep the number of
active results below 4.  Their uses have been reduced.

* jk/git-path:
  memoize common git-path "constant" files
  get_repo_path: refactor path-allocation
  find_hook: keep our own static buffer
  refs.c: remove_empty_directories can take a strbuf
  refs.c: avoid git_path assignment in lock_ref_sha1_basic
  refs.c: avoid repeated git_path calls in rename_tmp_log
  refs.c: simplify strbufs in reflog setup and writing
  path.c: drop git_path_submodule
  refs.c: remove extra git_path calls from read_loose_refs
  remote.c: drop extraneous local variable from migrate_file
  prefer mkpathdup to mkpath in assignments
  prefer git_pathdup to git_path in some possibly-dangerous cases
  add_to_alternates_file: don't add duplicate entries
  t5700: modernize style
  cache.h: complete set of git_path_submodule helpers
  cache.h: clarify documentation for git_path, et al
2015-08-19 14:48:56 -07:00
Junio C Hamano 51a22ce147 Merge branch 'jc/finalize-temp-file'
Long overdue micro clean-up.

* jc/finalize-temp-file:
  sha1_file.c: rename move_temp_to_file() to finalize_object_file()
2015-08-19 14:48:55 -07:00
Junio C Hamano 0a489b0680 prepare_packed_git(): refactor garbage reporting in pack directory
The hook to report "garbage" files in $GIT_OBJECT_DIRECTORY/pack/
could be generic but is too specific to count-object's needs.

Move the part to produce human-readable messages to count-objects,
and refine the interface to callback with the "bits" with values
defined in the cache.h header file, so that other callers (e.g.
prune) can later use the same mechanism to enumerate different
kinds of garbage files and do something intelligent about them,
other than reporting in textual messages.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-17 09:14:59 -07:00
Clemens Buchacher dff6f280df git_open_noatime: return with errno=0 on success
In read_sha1_file_extended we die if read_object fails with a fatal
error. We detect a fatal error if errno is non-zero and is not
ENOENT. If the object could not be read because it does not exist,
this is not considered a fatal error and we want to return NULL.

Somewhere down the line, read_object calls git_open_noatime to open
a pack index file, for example. We first try open with O_NOATIME.
If O_NOATIME fails with EPERM, we retry without O_NOATIME. When the
second open succeeds, errno is however still set to EPERM from the
first attempt. When we finally determine that the object does not
exist, read_object returns NULL and read_sha1_file_extended dies
with a fatal error:

    fatal: failed to read object <sha1>: Operation not permitted

Fix this by resetting errno to zero before we call open again.

Cc: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Clemens Buchacher <clemens.buchacher@intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-12 13:56:19 -07:00
Johannes Sixt 094c7e6352 prune: close directory earlier during loose-object directory traversal
27e1e22d (prune: factor out loose-object directory traversal, 2014-10-16)
introduced a new function for_each_loose_file_in_objdir() with a helper
for_each_file_in_obj_subdir(). The latter calls callbacks for each file
found during a directory traversal and finally also a callback for the
directory itself.

git-prune uses the function to clean up the object directory. In
particular, in the directory callback it calls rmdir(). On Windows XP,
this rmdir call fails, because the directory is still open while the
callback is called. Close the directory before calling the callback.

Signed-off-by: Johannes Sixt <j6t@kdbg.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-12 12:06:00 -07:00
Jeff King 77b9b1d13a add_to_alternates_file: don't add duplicate entries
The add_to_alternates_file function blindly uses
hold_lock_file_for_append to copy the existing contents, and
then adds the new line to it. This has two minor problems:

  1. We might add duplicate entries, which are ugly and
     inefficient.

  2. We do not check that the file ends with a newline, in
     which case we would bogusly append to the final line.
     This is quite unlikely in practice, though, as we call
     this function only from git-clone, so presumably we are
     the only writers of the file (and we always add a
     newline).

Instead of using hold_lock_file_for_append, let's copy the
file line by line, which ensures all records are properly
terminated. If we see an extra line, we can simply abort the
update (there is no point in even copying the rest, as we
know that it would be identical to the original).

As a bonus, we also get rid of some calls to the
static-buffer mkpath and git_path functions.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 15:15:42 -07:00
Junio C Hamano cb5add5868 sha1_file.c: rename move_temp_to_file() to finalize_object_file()
Since 5a688fe4 ("core.sharedrepository = 0mode" should set, not
loosen, 2009-03-25), we kept reminding ourselves:

    NEEDSWORK: this should be renamed to finalize_temp_file() as
    "moving" is only a part of what it does, when no patch between
    master to pu changes the call sites of this function.

without doing anything about it.  Let's do so.

The purpose of this function was not to move but to finalize.  The
detail of the primarily implementation of finalizing was to link the
temporary file to its final name and then to unlink, which wasn't
even "moving".  The alternative implementation did "move" by calling
rename(2), which is a fun tangent.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-10 11:10:37 -07:00
Junio C Hamano 7c696007ca Merge branch 'jk/fix-refresh-utime' into maint
Fix a small bug in our use of umask() return value.

* jk/fix-refresh-utime:
  check_and_freshen_file: fix reversed success-check
2015-07-27 12:21:40 -07:00