1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-28 03:26:08 +02:00
Commit Graph

248 Commits

Author SHA1 Message Date
Martin Ågren b227586831 lock_file: make function-local locks non-static
Placing `struct lock_file`s on the stack used to be a bad idea, because
the temp- and lockfile-machinery would keep a pointer into the struct.
But after 076aa2cbd (tempfile: auto-allocate tempfiles on heap,
2017-09-05), we can safely have lockfiles on the stack. (This applies
even if a user returns early, leaving a locked lock behind.)

These `struct lock_file`s are local to their respective functions and we
can drop their staticness.

For good measure, I have inspected these sites and come to believe that
they always release the lock, with the possible exception of bailing out
using `die()` or `exit()` or by returning from a `cmd_foo()`.

As pointed out by Jeff King, it would be bad if someone held on to a
`struct lock_file *` for some reason. After some grepping, I agree with
his findings: no-one appears to be doing that.

Signed-off-by: Martin Ågren <martin.agren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-10 14:54:45 +09:00
Junio C Hamano 3915f9a4fa Merge branch 'jc/parseopt-expiry-errors'
"git gc --prune=nonsense" spent long time repacking and then
silently failed when underlying "git prune --expire=nonsense"
failed to parse its command line.  This has been corrected.

* jc/parseopt-expiry-errors:
  parseopt: handle malformed --expire arguments more nicely
  gc: do not upcase error message shown with die()
2018-05-08 15:59:33 +09:00
Junio C Hamano 8ab5aa4bd8 parseopt: handle malformed --expire arguments more nicely
A few commands that parse --expire=<time> command line option behave
sillily when given nonsense input.  For example

    $ git prune --no-expire
    Segmentation falut
    $ git prune --expire=npw; echo $?
    129

Both come from parse_opt_expiry_date_cb().

The former is because the function is not prepared to see arg==NULL
(for "--no-expire", it is a norm; "--expire" at the end of the
command line could be made to pass NULL, if it is told that the
argument is optional, but we don't so we do not have to worry about
that case).

The latter is because it does not check the value returned from the
underlying parse_expiry_date().

This seems to be a recent regression introduced while we attempted
to avoid spewing the entire usage message when given a correct
option but with an invalid value at 3bb0923f ("parse-options: do not
show usage upon invalid option value", 2018-03-22).  Before that, we
didn't fail silently but showed a full usage help (which arguably is
not all that better).

Also catch this error early when "git gc --prune=<expiration>" is
misspelled by doing a dummy parsing before the main body of "gc"
that is time consuming even begins.  Otherwise, we'd spend time to
pack objects and then later have "git prune" first notice the error.
Aborting "gc" in the middle that way is not harmful but is ugly and
can be avoided.

Helped-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-23 22:36:59 +09:00
Junio C Hamano 96913c9df6 gc: do not upcase error message shown with die()
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-23 22:36:14 +09:00
Nguyễn Thái Ngọc Duy 9806f5a7bf gc --auto: exclude base pack if not enough mem to "repack -ad"
pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
giant base pack to avoid this problem is also known for a long time.

Recent patches add an option to do just this, but it has to be either
configured or activated manually. This patch lets `git gc --auto`
activate this mode automatically when it thinks `repack -ad` will use
a lot of memory and start affecting the system due to swapping or
flushing OS cache.

gc --auto decides to do this based on an estimation of pack-objects
memory usage, which is quite accurate at least for the heap part, and
whether that fits in half of system memory (the assumption here is for
desktop environment where there are many other applications running).

This mechanism only kicks in if gc.bigBasePackThreshold is not configured.
If it is, it is assumed that the user already knows what they want.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Nguyễn Thái Ngọc Duy 8fc6776247 gc: handle a corner case in gc.bigPackThreshold
This config allows us to keep <N> packs back if their size is larger
than a limit. But if this N >= gc.autoPackLimit, we may have a
problem. We are supposed to reduce the number of packs after a
threshold because it affects performance.

We could tell the user that they have incompatible gc.bigPackThreshold
and gc.autoPackLimit, but it's kinda hard when 'git gc --auto' runs in
background. Instead let's fall back to the next best stategy: try to
reduce the number of packs anyway, but keep the base pack out. This
reduces the number of packs to two and hopefully won't take up too
much resources to repack (the assumption still is the base pack takes
most resources to handle).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Nguyễn Thái Ngọc Duy 55dfe13df9 gc: add gc.bigPackThreshold config
The --keep-largest-pack option is not very convenient to use because
you need to tell gc to do this explicitly (and probably on just a few
large repos).

Add a config key that enables this mode when packs larger than a limit
are found. Note that there's a slight behavior difference compared to
--keep-largest-pack: all packs larger than the threshold are kept, not
just the largest one.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Nguyễn Thái Ngọc Duy ae4e89e549 gc: add --keep-largest-pack option
This adds a new repack mode that combines everything into a secondary
pack, leaving the largest pack alone.

This could help reduce memory pressure. On linux-2.6.git, valgrind
massif reports 1.6GB heap in "pack all" case, and 535MB in "pack
all except the base pack" case. We save roughly 1GB memory by
excluding the base pack.

This should also lower I/O because we don't have to rewrite a giant
pack every time (e.g. for linux-2.6.git that's a 1.4GB pack file)..

PS. The use of string_list here seems overkill, but we'll need it in
the next patch...

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Junio C Hamano 3a1ec60c43 Merge branch 'sb/packfiles-in-repository'
Refactoring of the internal global data structure continues.

* sb/packfiles-in-repository:
  packfile: keep prepare_packed_git() private
  packfile: allow find_pack_entry to handle arbitrary repositories
  packfile: add repository argument to find_pack_entry
  packfile: allow reprepare_packed_git to handle arbitrary repositories
  packfile: allow prepare_packed_git to handle arbitrary repositories
  packfile: allow prepare_packed_git_one to handle arbitrary repositories
  packfile: add repository argument to reprepare_packed_git
  packfile: add repository argument to prepare_packed_git
  packfile: add repository argument to prepare_packed_git_one
  packfile: allow install_packed_git to handle arbitrary repositories
  packfile: allow rearrange_packed_git to handle arbitrary repositories
  packfile: allow prepare_packed_git_mru to handle arbitrary repositories
2018-04-11 13:09:55 +09:00
Junio C Hamano cf0b1793ea Merge branch 'sb/object-store'
Refactoring the internal global data structure to make it possible
to open multiple repositories, work with and then close them.

Rerolled by Duy on top of a separate preliminary clean-up topic.
The resulting structure of the topics looked very sensible.

* sb/object-store: (27 commits)
  sha1_file: allow sha1_loose_object_info to handle arbitrary repositories
  sha1_file: allow map_sha1_file to handle arbitrary repositories
  sha1_file: allow map_sha1_file_1 to handle arbitrary repositories
  sha1_file: allow open_sha1_file to handle arbitrary repositories
  sha1_file: allow stat_sha1_file to handle arbitrary repositories
  sha1_file: allow sha1_file_name to handle arbitrary repositories
  sha1_file: add repository argument to sha1_loose_object_info
  sha1_file: add repository argument to map_sha1_file
  sha1_file: add repository argument to map_sha1_file_1
  sha1_file: add repository argument to open_sha1_file
  sha1_file: add repository argument to stat_sha1_file
  sha1_file: add repository argument to sha1_file_name
  sha1_file: allow prepare_alt_odb to handle arbitrary repositories
  sha1_file: allow link_alt_odb_entries to handle arbitrary repositories
  sha1_file: add repository argument to prepare_alt_odb
  sha1_file: add repository argument to link_alt_odb_entries
  sha1_file: add repository argument to read_info_alternates
  sha1_file: add repository argument to link_alt_odb_entry
  sha1_file: add raw_object_store argument to alt_odb_usable
  pack: move approximate object count to object store
  ...
2018-04-11 13:09:55 +09:00
Nguyễn Thái Ngọc Duy 464416a2ea packfile: keep prepare_packed_git() private
The reason callers have to call this is to make sure either packed_git
or packed_git_mru pointers are initialized since we don't do that by
default. Sometimes it's hard to see this connection between where the
function is called and where packed_git pointer is used (sometimes in
separate functions).

Keep this dependency internal because now all access to packed_git and
packed_git_mru must go through get_xxx() wrappers.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:07:43 -07:00
Stefan Beller a49d283435 packfile: add repository argument to reprepare_packed_git
See previous patch for explanation.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:07:43 -07:00
Stefan Beller 6fdb4e9f5a packfile: add repository argument to prepare_packed_git
Add a repository argument to allow prepare_packed_git callers to be
more specific about which repository to handle. See commit "sha1_file:
add repository argument to link_alt_odb_entry" for an explanation of
the #define trick.

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:07:43 -07:00
Stefan Beller a80d72db2a object-store: move packed_git and packed_git_mru to object store
In a process with multiple repositories open, packfile accessors
should be associated to a single repository and not shared globally.
Move packed_git and packed_git_mru into the_repository and adjust
callers to reflect this.

[nd: while at there, wrap access to these two fields in get_packed_git()
and get_packed_git_mru(). This allows us to lazily initialize these
fields without caller doing that explicitly]

Signed-off-by: Stefan Beller <sbeller@google.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-26 10:05:46 -07:00
Junio C Hamano 7fb6aefd2a Merge branch 'nd/parseopt-completion'
Teach parse-options API an option to help the completion script,
and make use of the mechanism in command line completion.

* nd/parseopt-completion: (45 commits)
  completion: more subcommands in _git_notes()
  completion: complete --{reuse,reedit}-message= for all notes subcmds
  completion: simplify _git_notes
  completion: don't set PARSE_OPT_NOCOMPLETE on --rerere-autoupdate
  completion: use __gitcomp_builtin in _git_worktree
  completion: use __gitcomp_builtin in _git_tag
  completion: use __gitcomp_builtin in _git_status
  completion: use __gitcomp_builtin in _git_show_branch
  completion: use __gitcomp_builtin in _git_rm
  completion: use __gitcomp_builtin in _git_revert
  completion: use __gitcomp_builtin in _git_reset
  completion: use __gitcomp_builtin in _git_replace
  remote: force completing --mirror= instead of --mirror
  completion: use __gitcomp_builtin in _git_remote
  completion: use __gitcomp_builtin in _git_push
  completion: use __gitcomp_builtin in _git_pull
  completion: use __gitcomp_builtin in _git_notes
  completion: use __gitcomp_builtin in _git_name_rev
  completion: use __gitcomp_builtin in _git_mv
  completion: use __gitcomp_builtin in _git_merge_base
  ...
2018-03-14 12:01:07 -07:00
Nguyễn Thái Ngọc Duy 7e1eeaa431 completion: use __gitcomp_builtin in _git_gc
The new completable option is --quiet.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-02-09 10:24:51 -08:00
Jonathan Tan 0c16cd499d gc: do not repack promisor packfiles
Teach gc to stop traversal at promisor objects, and to leave promisor
packfiles alone. This has the effect of only repacking non-promisor
packfiles, and preserves the distinction between promisor packfiles and
non-promisor packfiles.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-12-08 09:52:42 -08:00
Junio C Hamano abdf7d8e25 Merge branch 'aw/gc-lockfile-fscanf-fix'
"git gc" tries to avoid running two instances at the same time by
reading and writing pid/host from and to a lock file; it used to
use an incorrect fscanf() format when reading, which has been
corrected.

* aw/gc-lockfile-fscanf-fix:
  gc: call fscanf() with %<len>s, not %<len>c, when reading hostname
2017-09-25 15:24:09 +09:00
Junio C Hamano afe2fab72c gc: call fscanf() with %<len>s, not %<len>c, when reading hostname
Earlier in this codepath, we (ab)used "%<len>c" to read the hostname
recorded in the lockfile into locking_host[HOST_NAME_MAX + 1] while
substituting <len> with the actual value of HOST_NAME_MAX.

This turns out to be incorrect, as it is an instruction to read
exactly the specified number of bytes.  Because we are trying to
read at most that many bytes, we should be using "%<len>s" instead.

Helped-by: A. Wilcox <awilfox@adelielinux.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-17 13:21:44 +09:00
Jeff King 076aa2cbda tempfile: auto-allocate tempfiles on heap
The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:

  struct tempfile t;

  create_tempfile(&t, ...);
  ...
  if (!err)
          rename_tempfile(&t, ...);
  else
          delete_tempfile(&t);

But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.

Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).

But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it.  Let's
see if we can make it harder to get this wrong.

Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.

Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).

This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.

The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:

  1. Storing a pointer instead of the struct itself.

  2. Passing the pointer instead of taking the struct
     address.

  3. Handling a "struct tempfile *" return instead of a file
     descriptor.

We could play games to make this less noisy. For example, by
defining the tempfile like this:

  struct tempfile {
	struct heap_allocated_part_of_tempfile {
                int fd;
                ...etc
        } *actual_data;
  }

Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-09-06 17:19:54 +09:00
Jonathan Tan 0abe14f6a5 pack: move {,re}prepare_packed_git and approximate_object_count
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-08-23 15:12:07 -07:00
Junio C Hamano 764046f6b0 Merge branch 'jk/gc-pre-detach-under-hook'
We run an early part of "git gc" that deals with refs before
daemonising (and not under lock) even when running a background
auto-gc, which caused multiple gc processes attempting to run the
early part at the same time.  This is now prevented by running the
early part also under the GC lock.

* jk/gc-pre-detach-under-hook:
  gc: run pre-detach operations under lock
2017-07-18 12:48:10 -07:00
Junio C Hamano f056cde60e Merge branch 'rs/use-div-round-up'
Code cleanup.

* rs/use-div-round-up:
  use DIV_ROUND_UP
2017-07-12 15:18:23 -07:00
Jeff King c45af94dbc gc: run pre-detach operations under lock
We normally try to avoid having two auto-gc operations run
at the same time, because it wastes resources. This was done
long ago in 64a99eb47 (gc: reject if another gc is running,
unless --force is given, 2013-08-08).

When we do a detached auto-gc, we run the ref-related
commands _before_ detaching, to avoid confusing lock
contention. This was done by 62aad1849 (gc --auto: do not
lock refs in the background, 2014-05-25).

These two features do not interact well. The pre-detach
operations are run before we check the gc.pid lock, meaning
that on a busy repository we may run many of them
concurrently. Ideally we'd take the lock before spawning any
operations, and hold it for the duration of the program.

This is tricky, though, with the way the pid-file interacts
with the daemonize() process.  Other processes will check
that the pid recorded in the pid-file still exists. But
detaching causes us to fork and continue running under a
new pid. So if we take the lock before detaching, the
pid-file will have a bogus pid in it. We'd have to go back
and update it with the new pid after detaching. We'd also
have to play some tricks with the tempfile subsystem to
tweak the "owner" field, so that the parent process does not
clean it up on exit, but the child process does.

Instead, we can do something a bit simpler: take the lock
only for the duration of the pre-detach work, then detach,
then take it again for the post-detach work. Technically,
this means that the post-detach lock could lose to another
process doing pre-detach work. But in the long run this
works out.

That second process would then follow-up by doing
post-detach work. Unless it was in turn blocked by a third
process doing pre-detach work, and so on. This could in
theory go on indefinitely, as the pre-detach work does not
repack, and so need_to_gc() will continue to trigger.  But
in each round we are racing between the pre- and post-detach
locks. Eventually, one of the post-detach locks will win the
race and complete the full gc. So in the worst case, we may
racily repeat the pre-detach work, but we would never do so
simultaneously (it would happen via a sequence of serialized
race-wins).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-12 09:41:04 -07:00
René Scharfe 42c78a216e use DIV_ROUND_UP
Convert code that divides and rounds up to use DIV_ROUND_UP to make the
intent clearer and reduce the number of magic constants.

Signed-off-by: Rene Scharfe <l.s.r@web.de>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-07-10 14:24:36 -07:00
Junio C Hamano f31d23a399 Merge branch 'bw/config-h'
Fix configuration codepath to pay proper attention to commondir
that is used in multi-worktree situation, and isolate config API
into its own header file.

* bw/config-h:
  config: don't implicitly use gitdir or commondir
  config: respect commondir
  setup: teach discover_git_directory to respect the commondir
  config: don't include config.h by default
  config: remove git_config_iter
  config: create config.h
2017-06-24 14:28:41 -07:00
Brandon Williams b2141fc1d2 config: don't include config.h by default
Stop including config.h by default in cache.h.  Instead only include
config.h in those files which require use of the config system.

Signed-off-by: Brandon Williams <bmwill@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-06-15 12:56:22 -07:00
Junio C Hamano b15667bbdc Merge branch 'js/larger-timestamps'
Some platforms have ulong that is smaller than time_t, and our
historical use of ulong for timestamp would mean they cannot
represent some timestamp that the platform allows.  Invent a
separate and dedicated timestamp_t (so that we can distingiuish
timestamps and a vanilla ulongs, which along is already a good
move), and then declare uintmax_t is the type to be used as the
timestamp_t.

* js/larger-timestamps:
  archive-tar: fix a sparse 'constant too large' warning
  use uintmax_t for timestamps
  date.c: abort if the system time cannot handle one of our timestamps
  timestamp_t: a new data type for timestamps
  PRItime: introduce a new "printf format" for timestamps
  parse_timestamp(): specify explicitly where we parse timestamps
  t0006 & t5000: skip "far in the future" test when time_t is too limited
  t0006 & t5000: prepare for 64-bit timestamps
  ref-filter: avoid using `unsigned long` for catch-all data type
2017-05-16 11:51:59 +09:00
Johannes Schindelin dddbad728c timestamp_t: a new data type for timestamps
Git's source code assumes that unsigned long is at least as precise as
time_t. Which is incorrect, and causes a lot of problems, in particular
where unsigned long is only 32-bit (notably on Windows, even in 64-bit
versions).

So let's just use a more appropriate data type instead. In preparation
for this, we introduce the new `timestamp_t` data type.

By necessity, this is a very, very large patch, as it has to replace all
timestamps' data type in one go.

As we will use a data type that is not necessarily identical to `time_t`,
we need to be very careful to use `time_t` whenever we interact with the
system functions, and `timestamp_t` everywhere else.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-27 13:07:39 +09:00
Junio C Hamano 5938454cbc Merge branch 'dt/xgethostname-nul-termination'
gethostname(2) may not NUL terminate the buffer if hostname does
not fit; unfortunately there is no easy way to see if our buffer
was too small, but at least this will make sure we will not end up
using garbage past the end of the buffer.

* dt/xgethostname-nul-termination:
  xgethostname: handle long hostnames
  use HOST_NAME_MAX to size buffers for gethostname(2)
2017-04-23 22:07:57 -07:00
David Turner 5781a9a270 xgethostname: handle long hostnames
If the full hostname doesn't fit in the buffer supplied to
gethostname, POSIX does not specify whether the buffer will be
null-terminated, so to be safe, we should do it ourselves.  Introduce
new function, xgethostname, which ensures that there is always a \0
at the end of the buffer.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-18 19:58:04 -07:00
René Scharfe da25bdb776 use HOST_NAME_MAX to size buffers for gethostname(2)
POSIX limits the length of host names to HOST_NAME_MAX.  Export the
fallback definition from daemon.c and use this constant to make all
buffers used with gethostname(2) big enough for any possible result
and a terminating NUL.

Inspired-by: David Turner <dturner@twosigma.com>
Signed-off-by: Rene Scharfe <l.s.r@web.de>
Signed-off-by: David Turner <dturner@twosigma.com>
Reviewed-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-04-18 19:57:41 -07:00
Jeff King 07af889136 gc: replace local buffer with git_path
We probe the "17/" loose object directory for auto-gc, and
use a local buffer to format the path. We can just use
git_path() for this. It handles paths of any length
(reducing our error handling). And because we feed the
result straight to a system call, we can just use the static
variant.

Note that git_path also knows the string "objects/" is
special, and will replace it with git_object_directory()
when necessary.

Another alternative would be to use sha1_file_name() for the
pretend object "170000...", but that ends up being more
hassle for no gain, as we have to truncate the final path
component.

Signed-off-by: Jeff King <peff@peff.net>
2017-03-30 14:59:50 -07:00
Junio C Hamano 94c9b5af70 Merge branch 'cc/split-index-config'
The experimental "split index" feature has gained a few
configuration variables to make it easier to use.

* cc/split-index-config: (22 commits)
  Documentation/git-update-index: explain splitIndex.*
  Documentation/config: add splitIndex.sharedIndexExpire
  read-cache: use freshen_shared_index() in read_index_from()
  read-cache: refactor read_index_from()
  t1700: test shared index file expiration
  read-cache: unlink old sharedindex files
  config: add git_config_get_expiry() from gc.c
  read-cache: touch shared index files when used
  sha1_file: make check_and_freshen_file() non static
  Documentation/config: add splitIndex.maxPercentChange
  t1700: add tests for splitIndex.maxPercentChange
  read-cache: regenerate shared index if necessary
  config: add git_config_get_max_percent_split_change()
  Documentation/git-update-index: talk about core.splitIndex config var
  Documentation/config: add information for core.splitIndex
  t1700: add tests for core.splitIndex
  update-index: warn in case of split-index incoherency
  read-cache: add and then use tweak_split_index()
  split-index: add {add,remove}_split_index() functions
  config: add git_config_get_split_index()
  ...
2017-03-17 13:50:23 -07:00
Christian Couder 77d67977ca config: add git_config_get_expiry() from gc.c
This function will be used in a following commit to get the expiration
time of the shared index files from the config, and it is generic
enough to be put in "config.c".

Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-03-01 13:34:54 -08:00
David Turner a831c06a2b gc: ignore old gc.log files
A server can end up in a state where there are lots of unreferenced
loose objects (say, because many users are doing a bunch of rebasing
and pushing their rebased branches).  Running "git gc --auto" in
this state would cause a gc.log file to be created, preventing
future auto gcs, causing pack files to pile up.  Since many git
operations are O(n) in the number of pack files, this would lead to
poor performance.

Git should never get itself into a state where it refuses to do any
maintenance, just because at some point some piece of the maintenance
didn't make progress.

Teach Git to ignore gc.log files which are older than (by default)
one day old, which can be tweaked via the gc.logExpiry configuration
variable.  That way, these pack files will get cleaned up, if
necessary, at least once per day.  And operators who find a need for
more-frequent gcs can adjust gc.logExpiry to meet their needs.

There is also some cleanup: a successful manual gc, or a
warning-free auto gc with an old log file, will remove any old
gc.log files.

It might still happen that manual intervention is required
(e.g. because the repo is corrupt), but at the very least it won't
be because Git is too dumb to try again.

Signed-off-by: David Turner <dturner@twosigma.com>
Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-13 15:19:11 -08:00
David Turner bdf56de896 auto gc: don't write bitmaps for incremental repacks
When git gc --auto does an incremental repack of loose objects, we do
not expect to be able to write a bitmap; it is very likely that
objects in the new pack will have references to objects outside of the
pack.  So we shouldn't try to write a bitmap, because doing so will
likely issue a warning.

This warning was making its way into gc.log.  When the gc.log was
present, future auto gc runs would refuse to run.

Patch by Jeff King.
Bug report, test, and commit message by David Turner.

Signed-off-by: David Turner <dturner@twosigma.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-12-29 13:45:35 -08:00
Junio C Hamano eb293ac8d6 Merge branch 'jk/reduce-gc-aggressive-depth' into maint
"git gc --aggressive" used to limit the delta-chain length to 250,
which is way too deep for gaining additional space savings and is
detrimental for runtime performance.  The limit has been reduced to
50.

* jk/reduce-gc-aggressive-depth:
  gc: default aggressive depth to 50
2016-09-29 16:49:42 -07:00
Junio C Hamano 0952ca8a95 Merge branch 'jk/reduce-gc-aggressive-depth'
"git gc --aggressive" used to limit the delta-chain length to 250,
which is way too deep for gaining additional space savings and is
detrimental for runtime performance.  The limit has been reduced to
50.

* jk/reduce-gc-aggressive-depth:
  gc: default aggressive depth to 50
2016-09-21 15:15:30 -07:00
Jeff King 07e7dbf0db gc: default aggressive depth to 50
This commit message is long and has lots of background and
numbers. The summary is: the current default of 250 doesn't
save much space, and costs CPU. It's not a good tradeoff.
Read on for details.

The "--aggressive" flag to git-gc does three things:

  1. use "-f" to throw out existing deltas and recompute from
     scratch

  2. use "--window=250" to look harder for deltas

  3. use "--depth=250" to make longer delta chains

Items (1) and (2) are good matches for an "aggressive"
repack. They ask the repack to do more computation work in
the hopes of getting a better pack. You pay the costs during
the repack, and other operations see only the benefit.

Item (3) is not so clear. Allowing longer chains means fewer
restrictions on the deltas, which means potentially finding
better ones and saving some space. But it also means that
operations which access the deltas have to follow longer
chains, which affects their performance. So it's a tradeoff,
and it's not clear that the tradeoff is even a good one.

The existing "250" numbers for "--aggressive" come
originally from this thread:

  http://public-inbox.org/git/alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org/

where Linus says:

  So when I said "--depth=250 --window=250", I chose those
  numbers more as an example of extremely aggressive
  packing, and I'm not at all sure that the end result is
  necessarily wonderfully usable. It's going to save disk
  space (and network bandwidth - the delta's will be re-used
  for the network protocol too!), but there are definitely
  downsides too, and using long delta chains may
  simply not be worth it in practice.

There are some numbers in that thread, but they're mostly
focused on the improved window size, and measure the
improvement from --depth=250 and --window=250 together.
E.g.:

  http://public-inbox.org/git/9e4733910712062006l651571f3w7f76ce64c6650dff@mail.gmail.com/

talks about the improved run-time of "git-blame", which
comes from the reduced pack size. But most of that reduction
is coming from --window=250, whereas most of the extra costs
come from --depth=250. There's a link in that thread showing
that increasing the depth beyond 50 doesn't seem to help
much with the size:

  https://vcscompare.blogspot.com/2008/06/git-repack-parameters.html

but again, no discussion of the timing impact.

In an earlier thread from Ted Ts'o which discussed setting
the non-aggressive default (from 10 to 50):

  http://public-inbox.org/git/20070509134958.GA21489%40thunk.org/

we have more numbers, with the conclusion that going past 50
does not help size much, and hurts the speed of normal
operations.

So from that, we might guess that 50 is actually a sweet
spot, even for aggressive, if we interpret aggressive to
"spend time now to make a better pack". It is not clear that
"--depth=250" is actually a better pack. It may be slightly
_smaller_, but it carries a run-time penalty.

Here are some more recent timings I did to verify that. They
show three things:

  - the size of the resulting pack (so disk saved to store,
    bandwidth saved on clones/fetches)

  - the cost of "rev-list --objects --all", which shows the
    effect of the delta chains on trees (commits typically
    don't delta, and the command doesn't touch the blobs at
    all)

  - the cost of "log -Sfoo", which will additionally access
    each blob

All cases were repacked with "git repack -adf --depth=$d
--window=250" (so basically, what would happen if we tweaked
the "gc --aggressive" default depth).

The timings are all wall-clock best-of-3. The machine itself
has plenty of RAM compared to the repositories (which is
probably typical of most workstations these days), so we're
really measuring CPU usage, as the whole thing will be in
disk cache after the first run.

The core.deltaBaseCacheLimit is at its default of 96MiB.
It's possible that tweaking it would have some impact on the
tests, as some of them (especially "log -S" on a large repo)
are likely to overflow that. But bumping that carries a
run-time memory cost, so for these tests, I focused on what
we could do just with the on-disk pack tradeoffs.

Each test is done for four depths: 250 (the current value),
50 (the current default that tested well previously), 100
(to show something on the larger side, which previous tests
showed was not a good tradeoff), and 10 (the very old
default, which previous tests showed was worse than 50).

Here are the numbers for linux.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  | 967MB |  n/a  | 48.159s  |   n/a  | 378.088   |   n/a
    100  | 971MB | +0.4% | 41.471s  | -13.9% | 342.060   |  -9.5%
     50  | 979MB | +1.2% | 37.778s  | -21.6% | 311.040s  | -17.7%
     10  | 1.1GB | +6.6% | 32.518s  | -32.5% | 279.890s  | -25.9%

and for git.git:

   depth |  size |  %    | rev-list |  %     | log -Sfoo |   %
  -------+-------+-------+----------+--------+-----------+-------
    250  |  48MB |  n/a  |  2.215s  |   n/a  |  20.922s  |   n/a
    100  |  49MB | +0.5% |  2.140s  |  -3.4% |  17.736s  | -15.2%
     50  |  49MB | +1.7% |  2.099s  |  -5.2% |  15.418s  | -26.3%
     10  |  53MB | +9.3% |  2.001s  |  -9.7% |  12.677s  | -39.4%

You can see that that the CPU savings for regular operations improves as we
decrease the depth. The savings are less for "rev-list" on a smaller repository
than they are for blob-accessing operations, or even rev-list on a larger
repository. This may mean that a larger delta cache would help (though setting
core.deltaBaseCacheLimit by itself doesn't).

But we can also see that the space savings are not that great as the depth goes
higher. Saving 5-10% between 10 and 50 is probably worth the CPU tradeoff.
Saving 1% to go from 50 to 100, or another 0.5% to go from 100 to 250 is
probably not.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-08-11 11:53:19 -07:00
Junio C Hamano 87be95b6f9 Merge branch 'ew/gc-auto-pack-limit-fix' into maint
"gc.autoPackLimit" when set to 1 should not trigger a repacking
when there is only one pack, but the code counted poorly and did
so.

* ew/gc-auto-pack-limit-fix:
  gc: fix off-by-one error with gc.autoPackLimit
2016-07-28 11:25:56 -07:00
Junio C Hamano 97865e83c7 Merge branch 'ew/gc-auto-pack-limit-fix'
"gc.autoPackLimit" when set to 1 should not trigger a repacking
when there is only one pack, but the code counted poorly and did
so.

* ew/gc-auto-pack-limit-fix:
  gc: fix off-by-one error with gc.autoPackLimit
2016-07-13 11:24:12 -07:00
Eric Wong 5f4e3bf536 gc: fix off-by-one error with gc.autoPackLimit
This matches the documentation and allows gc.autoPackLimit=1
to maintain a single pack without attempting a repack on every
"git gc --auto" invocation.

Signed-off-by: Eric Wong <e@80x24.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-27 08:28:47 -07:00
Jeff King 45014beac0 Merge branch 'dk/gc-idx-wo-pack'
Having a leftover .idx file without corresponding .pack file in
the repository hurts performance; "git gc" learned to prune them.

* dk/gc-idx-wo-pack:
  gc: remove garbage .idx files from pack dir
  t5304: test cleaning pack garbage
  prepare_packed_git(): refactor garbage reporting in pack directory
2015-11-20 06:55:34 -05:00
Doug Kelly 478f34d2b6 gc: remove garbage .idx files from pack dir
Add a custom report_garbage handler to collect and remove
garbage .idx files from the pack directory.

Signed-off-by: Doug Kelly <dougk.ff7@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-11-04 11:30:22 -08:00
Junio C Hamano 808d119263 Merge branch 'js/misc-fixes'
Various compilation fixes and squelching of warnings.

* js/misc-fixes:
  Correct fscanf formatting string for I64u values
  Silence GCC's "cast of pointer to integer of a different size" warning
  Squelch warning about an integer overflow
2015-10-30 13:07:00 -07:00
Junio C Hamano fa46579555 Merge branch 'jk/repository-extension'
Prepare for Git on-disk repository representation to undergo
backward incompatible changes by introducing a new repository
format version "1", with an extension mechanism.

* jk/repository-extension:
  introduce "preciousObjects" repository extension
  introduce "extensions" form of core.repositoryformatversion
2015-10-26 15:55:25 -07:00
Waldek Maleska fdcdb77855 Correct fscanf formatting string for I64u values
This fix is probably purely cosmetic because PRIuMAX is likely identical
to SCNuMAX. Nevertheless, when using a function of the scanf() family,
the correct interpolation to use is the latter, not the former.

Signed-off-by: Waldek Maleska <w.maleska@gmail.com>
Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-26 13:24:03 -07:00
Junio C Hamano 78891795df Merge branch 'jk/war-on-sprintf'
Many allocations that is manually counted (correctly) that are
followed by strcpy/sprintf have been replaced with a less error
prone constructs such as xstrfmt.

Macintosh-specific breakage was noticed and corrected in this
reroll.

* jk/war-on-sprintf: (70 commits)
  name-rev: use strip_suffix to avoid magic numbers
  use strbuf_complete to conditionally append slash
  fsck: use for_each_loose_file_in_objdir
  Makefile: drop D_INO_IN_DIRENT build knob
  fsck: drop inode-sorting code
  convert strncpy to memcpy
  notes: document length of fanout path with a constant
  color: add color_set helper for copying raw colors
  prefer memcpy to strcpy
  help: clean up kfmclient munging
  receive-pack: simplify keep_arg computation
  avoid sprintf and strcpy with flex arrays
  use alloc_ref rather than hand-allocating "struct ref"
  color: add overflow checks for parsing colors
  drop strcpy in favor of raw sha1_to_hex
  use sha1_to_hex_r() instead of strcpy
  daemon: use cld->env_array when re-spawning
  stat_tracking_info: convert to argv_array
  http-push: use an argv_array for setup_revisions
  fetch-pack: use argv_array for index-pack / unpack-objects
  ...
2015-10-20 15:24:01 -07:00
Junio C Hamano 076c827858 Merge branch 'nd/gc-auto-background-fix'
When "git gc --auto" is backgrounded, its diagnosis message is
lost.  Save it to a file in $GIT_DIR and show it next time the "gc
--auto" is run.

* nd/gc-auto-background-fix:
  gc: save log from daemonized gc --auto and print it next time
2015-10-15 15:43:33 -07:00
Jeff King 5096d4909f convert trivial sprintf / strcpy calls to xsnprintf
We sometimes sprintf into fixed-size buffers when we know
that the buffer is large enough to fit the input (either
because it's a constant, or because it's numeric input that
is bounded in size). Likewise with strcpy of constant
strings.

However, these sites make it hard to audit sprintf and
strcpy calls for buffer overflows, as a reader has to
cross-reference the size of the array with the input. Let's
use xsnprintf instead, which communicates to a reader that
we don't expect this to overflow (and catches the mistake in
case we do).

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-25 10:18:18 -07:00
Nguyễn Thái Ngọc Duy 329e6e8794 gc: save log from daemonized gc --auto and print it next time
While commit 9f673f9 (gc: config option for running --auto in
background - 2014-02-08) helps reduce some complaints about 'gc
--auto' hogging the terminal, it creates another set of problems.

The latest in this set is, as the result of daemonizing, stderr is
closed and all warnings are lost. This warning at the end of cmd_gc()
is particularly important because it tells the user how to avoid "gc
--auto" running repeatedly. Because stderr is closed, the user does
not know, naturally they complain about 'gc --auto' wasting CPU.

Daemonized gc now saves stderr to $GIT_DIR/gc.log. Following gc --auto
will not run and gc.log printed out until the user removes gc.log.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-09-21 09:43:30 -07:00
Junio C Hamano db86e61cbb Merge branch 'mh/tempfile'
The "lockfile" API has been rebuilt on top of a new "tempfile" API.

* mh/tempfile:
  credential-cache--daemon: use tempfile module
  credential-cache--daemon: delete socket from main()
  gc: use tempfile module to handle gc.pid file
  lock_repo_for_gc(): compute the path to "gc.pid" only once
  diff: use tempfile module
  setup_temporary_shallow(): use tempfile module
  write_shared_index(): use tempfile module
  register_tempfile(): new function to handle an existing temporary file
  tempfile: add several functions for creating temporary files
  prepare_tempfile_object(): new function, extracted from create_tempfile()
  tempfile: a new module for handling temporary files
  commit_lock_file(): use get_locked_file_path()
  lockfile: add accessor get_lock_file_path()
  lockfile: add accessors get_lock_file_fd() and get_lock_file_fp()
  create_bundle(): duplicate file descriptor to avoid closing it twice
  lockfile: move documentation to lockfile.h and lockfile.c
2015-08-25 14:57:09 -07:00
Michael Haggerty ebebeaea0a gc: use tempfile module to handle gc.pid file
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-12 14:50:11 -07:00
Michael Haggerty 00539cef39 lock_repo_for_gc(): compute the path to "gc.pid" only once
Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-08-12 14:50:11 -07:00
Junio C Hamano c1e5ca90db Merge branch 'es/worktree-add'
Remove remaining cruft from  "git checkout --to", which
transitioned to "git worktree add".

* es/worktree-add:
  config: rename "gc.pruneWorktreesExpire" to "gc.worktreePruneExpire"
  Documentation/git-worktree: wordsmith worktree-related manpages
  Documentation/config: fix stale "git prune --worktree" reference
  Documentation/git-worktree: fix incorrect reference to file "locked"
  Documentation/git-worktree: consistently use term "linked working tree"
2015-08-12 14:09:55 -07:00
Eric Sunshine 114ff8881a config: rename "gc.pruneWorktreesExpire" to "gc.worktreePruneExpire"
As of df0b6cf (worktree: new place for "git prune --worktrees",
2015-06-29), linked worktree pruning functionality moved from
"git prune --worktrees" to "git worktree prune". Rename the
associated configuration variable accordingly.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-07-20 11:09:06 -07:00
Junio C Hamano 7783eb2e59 Merge branch 'nd/multiple-work-trees'
"git checkout [<tree-ish>] <paths>" spent unnecessary cycles
checking if the current branch was checked out elsewhere, when we
know we are not switching the branches ourselves.

* nd/multiple-work-trees:
  worktree: new place for "git prune --worktrees"
  checkout: don't check worktrees when not necessary
2015-07-13 14:02:02 -07:00
Nguyễn Thái Ngọc Duy df0b6cfbda worktree: new place for "git prune --worktrees"
Commit 23af91d (prune: strategies for linked checkouts - 2014-11-30)
adds "--worktrees" to "git prune" without realizing that "git prune" is
for object database only. This patch moves the same functionality to a
new command "git worktree".

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
2015-06-29 08:48:44 -07:00
Jeff King 067fbd4105 introduce "preciousObjects" repository extension
If this extension is used in a repository, then no
operations should run which may drop objects from the object
storage. This can be useful if you are sharing that storage
with other repositories whose refs you cannot see.

For instance, if you do:

  $ git clone -s parent child
  $ git -C parent config extensions.preciousObjects true
  $ git -C parent config core.repositoryformatversion 1

you now have additional safety when running git in the
parent repository. Prunes and repacks will bail with an
error, and `git gc` will skip those operations (it will
continue to pack refs and do other non-object operations).
Older versions of git, when run in the repository, will
fail on every operation.

Note that we do not set the preciousObjects extension by
default when doing a "clone -s", as doing so breaks
backwards compatibility. It is a decision the user should
make explicitly.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-06-24 17:09:35 -07:00
Junio C Hamano 68a2e6a2c8 Merge branch 'nd/multiple-work-trees'
A replacement for contrib/workdir/git-new-workdir that does not
rely on symbolic links and make sharing of objects and refs safer
by making the borrowee and borrowers aware of each other.

* nd/multiple-work-trees: (41 commits)
  prune --worktrees: fix expire vs worktree existence condition
  t1501: fix test with split index
  t2026: fix broken &&-chain
  t2026 needs procondition SANITY
  git-checkout.txt: a note about multiple checkout support for submodules
  checkout: add --ignore-other-wortrees
  checkout: pass whole struct to parse_branchname_arg instead of individual flags
  git-common-dir: make "modules/" per-working-directory directory
  checkout: do not fail if target is an empty directory
  t2025: add a test to make sure grafts is working from a linked checkout
  checkout: don't require a work tree when checking out into a new one
  git_path(): keep "info/sparse-checkout" per work-tree
  count-objects: report unused files in $GIT_DIR/worktrees/...
  gc: support prune --worktrees
  gc: factor out gc.pruneexpire parsing code
  gc: style change -- no SP before closing parenthesis
  checkout: clean up half-prepared directories in --to mode
  checkout: reject if the branch is already checked out elsewhere
  prune: strategies for linked checkouts
  checkout: support checking out into a new working directory
  ...
2015-05-11 14:23:39 -07:00
Alex Henrie 9c9b4f2f8b standardize usage info string format
This patch puts the usage info strings that were not already in docopt-
like format into docopt-like format, which will be a litle easier for
end users and a lot easier for translators. Changes include:

- Placing angle brackets around fill-in-the-blank parameters
- Putting dashes in multiword parameter names
- Adding spaces to [-f|--foobar] to make [-f | --foobar]
- Replacing <foobar>* with [<foobar>...]

Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-01-14 09:32:04 -08:00
Nguyễn Thái Ngọc Duy e3df33bb1b gc: support prune --worktrees
Helped-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Marc Branchaud <marcnarc@xiplink.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:18 -08:00
Nguyễn Thái Ngọc Duy 09dbb90b09 gc: factor out gc.pruneexpire parsing code
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:17 -08:00
Nguyễn Thái Ngọc Duy 2cfe2a7878 gc: style change -- no SP before closing parenthesis
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-12-01 11:00:17 -08:00
Michael Haggerty 697cc8efd9 lockfile.h: extract new header file for the functions in lockfile.c
Move the interface declaration for the functions in lockfile.c from
cache.h to a new file, lockfile.h. Add #includes where necessary (and
remove some redundant includes of cache.h by files that already
include builtin.h).

Move the documentation of the lock_file state diagram from lockfile.c
to the new header file.

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-10-01 13:56:14 -07:00
Tanay Abhra 5801d3b416 builtin/gc.c: replace `git_config()` with `git_config_get_*()` family
Use `git_config_get_*()` family instead of `git_config()` to take advantage of
the config-set API which provides a cleaner control flow.

Signed-off-by: Tanay Abhra <tanayabh@gmail.com>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-08-07 13:33:28 -07:00
Junio C Hamano 1a81f6ceea Merge branch 'nd/daemonize-gc'
"git gc --auto" was recently changed to run in the background to
give control back early to the end-user sitting in front of the
terminal, but it forgot that housekeeping involving reflogs should
be done without other processes competing for accesses to the refs.

* nd/daemonize-gc:
  gc --auto: do not lock refs in the background
2014-06-16 12:18:12 -07:00
Nguyễn Thái Ngọc Duy 62aad1849f gc --auto: do not lock refs in the background
9f673f9 (gc: config option for running --auto in background -
2014-02-08) puts "gc --auto" in background to reduce user's wait
time. Part of the garbage collecting is pack-refs and pruning
reflogs. These require locking some refs and may abort other processes
trying to lock the same ref. If gc --auto is fired in the middle of a
script, gc's holding locks in the background could fail the script,
which could never happen before 9f673f9.

Keep running pack-refs and "reflog --prune" in foreground to stop
parallel ref updates. The remaining background operations (repack,
prune and rerere) should not impact running git processes.

Reported-by: Adam Borowski <kilobyte@angband.pl>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-05-27 12:33:53 -07:00
Junio C Hamano 8815d8aa7c Merge branch 'nd/gc-aggressive'
Allow tweaking the maximum length of the delta-chain produced by
"gc --aggressive".

* nd/gc-aggressive:
  environment.c: fix constness for odb_pack_keep()
  gc --aggressive: make --depth configurable
2014-04-03 12:38:47 -07:00
Nguyễn Thái Ngọc Duy 125f81461d gc --aggressive: make --depth configurable
When 1c192f3 (gc --aggressive: make it really aggressive - 2007-12-06)
made --depth=250 the default value, it didn't really explain the
reason behind, especially the pros and cons of --depth=250.

An old mail from Linus below explains it at length. Long story short,
--depth=250 is a disk saver and a performance killer. Not everybody
agrees on that aggressiveness. Let the user configure it.

    From: Linus Torvalds <torvalds@linux-foundation.org>
    Subject: Re: [PATCH] gc --aggressive: make it really aggressive
    Date: Thu, 6 Dec 2007 08:19:24 -0800 (PST)
    Message-ID: <alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org>
    Gmane-URL: http://article.gmane.org/gmane.comp.gcc.devel/94637

    On Thu, 6 Dec 2007, Harvey Harrison wrote:
    >
    > 7:41:25elapsed 86%CPU

    Heh. And this is why you want to do it exactly *once*, and then just
    export the end result for others ;)

    > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46...pack

    But yeah, especially if you allow longer delta chains, the end result can
    be much smaller (and what makes the one-time repack more expensive is the
    window size, not the delta chain - you could make the delta chains longer
    with no cost overhead at packing time)

    HOWEVER.

    The longer delta chains do make it potentially much more expensive to then
    use old history. So there's a trade-off. And quite frankly, a delta depth
    of 250 is likely going to cause overflows in the delta cache (which is
    only 256 entries in size *and* it's a hash, so it's going to start having
    hash conflicts long before hitting the 250 depth limit).

    So when I said "--depth=250 --window=250", I chose those numbers more as
    an example of extremely aggressive packing, and I'm not at all sure that
    the end result is necessarily wonderfully usable. It's going to save disk
    space (and network bandwidth - the delta's will be re-used for the network
    protocol too!), but there are definitely downsides too, and using long
    delta chains may simply not be worth it in practice.

    (And some of it might just want to have git tuning, ie if people think
    that long deltas are worth it, we could easily just expand on the delta
    hash, at the cost of some more memory used!)

    That said, the good news is that working with *new* history will not be
    affected negatively, and if you want to be _really_ sneaky, there are ways
    to say "create a pack that contains the history up to a version one year
    ago, and be very aggressive about those old versions that we still want to
    have around, but do a separate pack for newer stuff using less aggressive
    parameters"

    So this is something that can be tweaked, although we don't really have
    any really nice interfaces for stuff like that (ie the git delta cache
    size is hardcoded in the sources and cannot be set in the config file, and
    the "pack old history more aggressively" involves some manual scripting
    and knowing how "git pack-objects" works rather than any nice simple
    command line switch).

    So the thing to take away from this is:
     - git is certainly flexible as hell
     - .. but to get the full power you may need to tweak things
     - .. happily you really only need to have one person to do the tweaking,
       and the tweaked end results will be available to others that do not
       need to know/care.

    And whether the difference between 320MB and 500MB is worth any really
    involved tweaking (considering the potential downsides), I really don't
    know. Only testing will tell.

			    Linus

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-31 10:26:24 -07:00
Junio C Hamano 9abf65d23c Merge branch 'bp/commit-p-editor'
When it is not necessary to edit a commit log message (e.g. "git
commit -m" is given a message without specifying "-e"), we used to
disable the spawning of the editor by overriding GIT_EDITOR, but
this means all the uses of the editor, other than to edit the
commit log message, are also affected.

* bp/commit-p-editor:
  run-command: mark run_hook_with_custom_index as deprecated
  merge hook tests: fix and update tests
  merge: fix GIT_EDITOR override for commit hook
  commit: fix patch hunk editing with "commit -p -m"
  test patch hunk editing with "commit -p -m"
  merge hook tests: use 'test_must_fail' instead of '!'
  merge hook tests: fix missing '&&' in test
2014-03-28 13:51:11 -07:00
Benoit Pierre 15048f8a9a commit: fix patch hunk editing with "commit -p -m"
Don't change git environment: move the GIT_EDITOR=":" override to the
hook command subprocess, like it's already done for GIT_INDEX_FILE.

Signed-off-by: Benoit Pierre <benoit.pierre@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-18 11:25:12 -07:00
Junio C Hamano 4c4ac4db2c Merge branch 'nd/daemonize-gc'
Allow running "gc --auto" in the background.

* nd/daemonize-gc:
  gc: config option for running --auto in background
  daemon: move daemonize() to libgit.a
2014-03-05 15:06:39 -08:00
Junio C Hamano 043478308f Merge branch 'ep/varscope'
Shrink lifetime of variables by moving their definitions to an
inner scope where appropriate.

* ep/varscope:
  builtin/gc.c: reduce scope of variables
  builtin/fetch.c: reduce scope of variable
  builtin/commit.c: reduce scope of variables
  builtin/clean.c: reduce scope of variable
  builtin/blame.c: reduce scope of variables
  builtin/apply.c: reduce scope of variables
  bisect.c: reduce scope of variable
2014-02-27 14:01:30 -08:00
Nguyễn Thái Ngọc Duy 9f673f9477 gc: config option for running --auto in background
`gc --auto` takes time and can block the user temporarily (but not any
less annoyingly). Make it run in background on systems that support
it. The only thing lost with running in background is printouts. But
gc output is not really interesting. You can keep it in foreground by
changing gc.autodetach.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-02-10 10:46:37 -08:00
Elia Pinto 4f1c0b21e9 builtin/gc.c: reduce scope of variables
Signed-off-by: Elia Pinto <gitter.spiros@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-31 10:44:05 -08:00
Junio C Hamano 92251b1b5b Merge branch 'nd/shallow-clone'
Fetching from a shallow-cloned repository used to be forbidden,
primarily because the codepaths involved were not carefully vetted
and we did not bother supporting such usage. This attempts to allow
object transfer out of a shallow-cloned repository in a controlled
way (i.e. the receiver become a shallow repository with truncated
history).

* nd/shallow-clone: (31 commits)
  t5537: fix incorrect expectation in test case 10
  shallow: remove unused code
  send-pack.c: mark a file-local function static
  git-clone.txt: remove shallow clone limitations
  prune: clean .git/shallow after pruning objects
  clone: use git protocol for cloning shallow repo locally
  send-pack: support pushing from a shallow clone via http
  receive-pack: support pushing to a shallow clone via http
  smart-http: support shallow fetch/clone
  remote-curl: pass ref SHA-1 to fetch-pack as well
  send-pack: support pushing to a shallow clone
  receive-pack: allow pushes that update .git/shallow
  connected.c: add new variant that runs with --shallow-file
  add GIT_SHALLOW_FILE to propagate --shallow-file to subprocesses
  receive/send-pack: support pushing from a shallow clone
  receive-pack: reorder some code in unpack()
  fetch: add --update-shallow to accept refs that update .git/shallow
  upload-pack: make sure deepening preserves shallow roots
  fetch: support fetching from a shallow repository
  clone: support remote shallow repository
  ...
2014-01-17 12:21:20 -08:00
Kyle J. McKay ed7eda8b38 gc: notice gc processes run by other users
Since 64a99eb4 git gc refuses to run without the --force option if
another gc process on the same repository is already running.

However, if the repository is shared and user A runs git gc on the
repository and while that gc is still running user B runs git gc on
the same repository the gc process run by user A will not be noticed
and the gc run by user B will go ahead and run.

The problem is that the kill(pid, 0) test fails with an EPERM error
since user B is not allowed to signal processes owned by user A
(unless user B is root).

Update the test to recognize an EPERM error as meaning the process
exists and another gc should not be run (unless --force is given).

Signed-off-by: Kyle J. McKay <mackyle@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-01-02 16:15:29 -08:00
Nguyễn Thái Ngọc Duy eab3296c7e prune: clean .git/shallow after pruning objects
This patch teaches "prune" to remove shallow roots that are no longer
reachable from any refs (e.g. when the relevant refs are removed).

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-12-10 16:14:19 -08:00
Junio C Hamano 414b7033b1 Merge branch 'nd/gc-lock-against-each-other'
* nd/gc-lock-against-each-other:
  gc: remove gc.pid file at end of execution
2013-10-30 12:10:27 -07:00
Jonathan Nieder 4c5baf0273 gc: remove gc.pid file at end of execution
This file isn't really harmful, but isn't useful either, and can create
minor annoyance for the user:

* It's confusing, as the presence of a *.pid file often implies that a
  process is currently running. A user running "ls .git/" and finding
  this file may incorrectly guess that a "git gc" is currently running.

* Leaving this file means that a "git gc" in an already gc-ed repo is
  no-longer a no-op. A user running "git gc" in a set of repositories,
  and then synchronizing this set (e.g. rsync -av, unison, ...) will see
  all the gc.pid files as changed, which creates useless noise.

This patch unlinks the file after the garbage collection is done, so that
gc.pid is actually present only during execution.

Future versions of Git may want to use the information left in the gc.pid
file (e.g. for policies like "don't attempt to run a gc if one has
already been ran less than X hours ago"). If so, this patch can safely be
reverted. For now, let's not bother the users.

Explained-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Improved-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-10-18 12:45:24 -07:00
Junio C Hamano a86a8b9752 Merge branch 'sb/parseopt-boolean-removal'
Convert most uses of OPT_BOOLEAN/OPTION_BOOLEAN that can use
OPT_BOOL/OPTION_BOOLEAN which have much saner semantics, and turn
remaining ones into OPT_SET_INT, OPT_COUNTUP, etc. as necessary.

* sb/parseopt-boolean-removal:
  revert: use the OPT_CMDMODE for parsing, reducing code
  checkout-index: fix negations of even numbers of -n
  config parsing options: allow one flag multiple times
  hash-object: replace stdin parsing OPT_BOOLEAN by OPT_COUNTUP
  branch, commit, name-rev: ease up boolean conditions
  checkout: remove superfluous local variable
  log, format-patch: parsing uses OPT__QUIET
  Replace deprecated OPT_BOOLEAN by OPT_BOOL
  Remove deprecated OPTION_BOOLEAN for parsing arguments
2013-09-04 12:39:03 -07:00
Nguyễn Thái Ngọc Duy 64a99eb476 gc: reject if another gc is running, unless --force is given
This may happen when `git gc --auto` is run automatically, then the
user, to avoid wait time, switches to a new terminal, keeps working
and `git gc --auto` is started again because the first gc instance has
not clean up the repository.

This patch tries to avoid multiple gc running, especially in --auto
mode. In the worst case, gc may be delayed 12 hours if a daemon reuses
the pid stored in gc.pid.

kill(pid, 0) support is added to MinGW port so it should work on
Windows too.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-09 09:10:05 -07:00
Stefan Beller d5d09d4754 Replace deprecated OPT_BOOLEAN by OPT_BOOL
This task emerged from b04ba2bb (parse-options: deprecate OPT_BOOLEAN,
2011-09-27). All occurrences of the respective variables have
been reviewed and none of them relied on the counting up mechanism,
but all of them were using the variable as a true boolean.

This patch does not change semantics of any command intentionally.

Signed-off-by: Stefan Beller <stefanbeller@googlemail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-05 11:32:19 -07:00
Tobias Ulmer df995c7dd2 silence git gc --auto --quiet output
When --quiet is requested, gc --auto should not display messages unless
there is an error.

Signed-off-by: Tobias Ulmer <tobiasu@tmux.org>
Acked-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-09-27 17:57:26 -07:00
Nguyễn Thái Ngọc Duy 6705c1629c i18n: gc: mark parseopt strings for translation
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-08-20 12:23:17 -07:00
Jeff King 234587fc87 gc: use argv-array for sub-commands
git-gc executes many sub-commands. The argument list for
some of these is constant, but for others we add more
arguments at runtime. The latter is implemented by allocating
a constant extra number of NULLs, and either using a custom
append function, or just referencing unused slots by number.

As of commit 7e52f56, which added two new arguments, it is
possible to exceed the constant number of slots for "repack"
by running "git gc --aggressive", causing "git gc" to die.

This patch converts all of the static argv lists to use
argv-array. In addition to fixing the overflow caused by
7e52f56, it has a few advantages:

  1. We can drop the custom append function (which,
     incidentally, had an off-by-one error exacerbating the
     static limit).

  2. We can drop the ugly magic numbers used when adding
     arguments to "prune".

  3. Adding further arguments will be easier; you can just
     add new "push" calls without worrying about increasing
     any static limits.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-18 16:17:42 -07:00
Jeff King 7e52f5660e gc: do not explode objects which will be immediately pruned
When we pack everything into one big pack with "git repack
-Ad", any unreferenced objects in to-be-deleted packs are
exploded into loose objects, with the intent that they will
be examined and possibly cleaned up by the next run of "git
prune".

Since the exploded objects will receive the mtime of the
pack from which they come, if the source pack is old, those
loose objects will end up pruned immediately. In that case,
it is much more efficient to skip the exploding step
entirely for these objects.

This patch teaches pack-objects to receive the expiration
information and avoid writing these objects out. It also
teaches "git gc" to pass the value of gc.pruneexpire to
repack (which in turn learns to pass it along to
pack-objects) so that this optimization happens
automatically during "git gc" and "git gc --auto".

Signed-off-by: Jeff King <peff@peff.net>
Acked-by: Nicolas Pitre <nico@fluxnic.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-11 11:09:49 -07:00
Jeff King bf0a59b387 prune: handle --progress/no-progress
And have "git gc" pass no-progress when quiet.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-11-07 22:12:19 -08:00
Andreas Schwab daab4eeafa builtin/gc.c: add missing newline in message
Signed-off-by: Andreas Schwab <schwab@linux-m68k.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-06-19 14:46:39 -07:00
Ævar Arnfjörð Bjarmason f6908ae86e i18n: git-gc "Auto packing the repository" message
Split up the "Auto packing the repository" message into quiet and
verbose variants to make translation easier.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-09 23:52:57 -08:00
Ævar Arnfjörð Bjarmason fea6128bae i18n: git-gc basic messages
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-03-09 23:52:57 -08:00
Junio C Hamano 6758af89e4 Merge branch 'jn/git-cmd-h-bypass-setup'
* jn/git-cmd-h-bypass-setup:
  update-index -h: show usage even with corrupt index
  merge -h: show usage even with corrupt index
  ls-files -h: show usage even with corrupt index
  gc -h: show usage even with broken configuration
  commit/status -h: show usage even with broken configuration
  checkout-index -h: show usage even in an invalid repository
  branch -h: show usage even in an invalid repository

Conflicts:
	builtin/merge.c
2010-12-12 21:49:50 -08:00
Jonathan Nieder 8c83968385 Describe various forms of "be quiet" using OPT__QUIET
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-15 10:04:56 -08:00
René Scharfe d52ee6e613 add description parameter to OPT__QUIET
Allows better help text to be defined than "be quiet".  Also make use
of the macro in a place that already had a different description.  No
object code changes intended.

Signed-off-by: Rene Scharfe <rene.scharfe@lsrfire.ath.cx>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-15 09:58:13 -08:00
Nguyễn Thái Ngọc Duy 0c8151b6ff gc -h: show usage even with broken configuration
Given a request for command-line usage information rather than some
more substantial action, the only friendly thing to do is to report
the usage information as soon as possible and exit.

Without this change, as "git gc" glances over the repository, it can
be distracted by the desire to report a malformed configuration file.

Noticed while working through reports from Duy's repository access
checker.

[jn: with rewritten log message and tests]

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Jonathan Nieder <jrnieder@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-10-22 11:04:53 -07:00
Linus Torvalds 81b50f3ce4 Move 'builtin-*' into a 'builtin/' subdirectory
This shrinks the top-level directory a bit, and makes it much more
pleasant to use auto-completion on the thing. Instead of

	[torvalds@nehalem git]$ em buil<tab>
	Display all 180 possibilities? (y or n)
	[torvalds@nehalem git]$ em builtin-sh
	builtin-shortlog.c     builtin-show-branch.c  builtin-show-ref.c
	builtin-shortlog.o     builtin-show-branch.o  builtin-show-ref.o
	[torvalds@nehalem git]$ em builtin-shor<tab>
	builtin-shortlog.c  builtin-shortlog.o
	[torvalds@nehalem git]$ em builtin-shortlog.c

you get

	[torvalds@nehalem git]$ em buil<tab>		[type]
	builtin/   builtin.h
	[torvalds@nehalem git]$ em builtin		[auto-completes to]
	[torvalds@nehalem git]$ em builtin/sh<tab>	[type]
	shortlog.c     shortlog.o     show-branch.c  show-branch.o  show-ref.c     show-ref.o
	[torvalds@nehalem git]$ em builtin/sho		[auto-completes to]
	[torvalds@nehalem git]$ em builtin/shor<tab>	[type]
	shortlog.c  shortlog.o
	[torvalds@nehalem git]$ em builtin/shortlog.c

which doesn't seem all that different, but not having that annoying
break in "Display all 180 possibilities?" is quite a relief.

NOTE! If you do this in a clean tree (no object files etc), or using an
editor that has auto-completion rules that ignores '*.o' files, you
won't see that annoying 'Display all 180 possibilities?' message - it
will just show the choices instead.  I think bash has some cut-off
around 100 choices or something.

So the reason I see this is that I'm using an odd editory, and thus
don't have the rules to cut down on auto-completion.  But you can
simulate that by using 'ls' instead, or something similar.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-02-22 14:29:41 -08:00