1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-18 14:46:10 +02:00
Commit Graph

77 Commits

Author SHA1 Message Date
Taylor Blau 37dc6d8104 builtin/repack.c: implement support for `--max-cruft-size`
Cruft packs are an alternative mechanism for storing a collection of
unreachable objects whose mtimes are recent enough to avoid being
pruned out of the repository.

When cruft packs were first introduced back in b757353676
(builtin/pack-objects.c: --cruft without expiration, 2022-05-20) and
a7d493833f (builtin/pack-objects.c: --cruft with expiration,
2022-05-20), the recommended workflow consisted of:

  - Repacking periodically, either by packing anything loose in the
    repository (via `git repack -d`) or producing a geometric sequence
    of packs (via `git repack --geometric=<d> -d`).

  - Every so often, splitting the repository into two packs, one cruft
    to store the unreachable objects, and another non-cruft pack to
    store the reachable objects.

Repositories may (out of band with the above) choose periodically to
prune out some unreachable objects which have aged out of the grace
period by generating a pack with `--cruft-expiration=<approxidate>`.

This allowed repositories to maintain relatively few packs on average,
and quarantine unreachable objects together in a cruft pack, avoiding
the pitfalls of holding unreachable objects as loose while they age out
(for more, see some of the details in 3d89a8c118
(Documentation/technical: add cruft-packs.txt, 2022-05-20)).

This all works, but can be costly from an I/O-perspective when
frequently repacking a repository that has many unreachable objects.
This problem is exacerbated when those unreachable objects are rarely
(if every) pruned.

Since there is at most one cruft pack in the above scheme, each time we
update the cruft pack it must be rewritten from scratch. Because much of
the pack is reused, this is a relatively inexpensive operation from a
CPU-perspective, but is very costly in terms of I/O since we end up
rewriting basically the same pack (plus any new unreachable objects that
have entered the repository since the last time a cruft pack was
generated).

At the time, we decided against implementing more robust support for
multiple cruft packs. This patch implements that support which we were
lacking.

Introduce a new option `--max-cruft-size` which allows repositories to
accumulate cruft packs up to a given size, after which point a new
generation of cruft packs can accumulate until it reaches the maximum
size, and so on. To generate a new cruft pack, the process works like
so:

  - Sort a list of any existing cruft packs in ascending order of pack
    size.

  - Starting from the beginning of the list, group cruft packs together
    while the accumulated size is smaller than the maximum specified
    pack size.

  - Combine the objects in these cruft packs together into a new cruft
    pack, along with any other unreachable objects which have since
    entered the repository.

Once a cruft pack grows beyond the size specified via `--max-cruft-size`
the pack is effectively frozen. This limits the I/O churn up to a
quadratic function of the value specified by the `--max-cruft-size`
option, instead of behaving quadratically in the number of total
unreachable objects.

When pruning unreachable objects, we bypass the new code paths which
combine small cruft packs together, and instead start from scratch,
passing in the appropriate `--max-pack-size` down to `pack-objects`,
putting it in charge of keeping the resulting set of cruft packs sized
correctly.

This may seem like further I/O churn, but in practice it isn't so bad.
We could prune old cruft packs for whom all or most objects are removed,
and then generate a new cruft pack with just the remaining set of
objects. But this additional complexity buys us relatively little,
because most objects end up being pruned anyway, so the I/O churn is
well contained.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-10-05 13:26:11 -07:00
Taylor Blau e3e24de1bf builtin/gc.c: make `gc.cruftPacks` enabled by default
Back in 5b92477f89 (builtin/gc.c: conditionally avoid pruning objects
via loose, 2022-05-20), `git gc` learned the `--cruft` option and
`gc.cruftPacks` configuration to opt-in to writing cruft packs when
collecting or pruning unreachable objects.

Cruft packs were introduced with the merge in a50036da1a (Merge branch
'tb/cruft-packs', 2022-06-03). They address the problem of "loose object
explosions", where Git will write out many individual loose objects when
there is a large number of unreachable objects that have not yet aged
past `--prune=<date>`.

Instead of keeping track of those unreachable yet recent objects via
their loose object file's mtime, cruft packs collect all unreachable
objects into a single pack with a corresponding `*.mtimes` file that
acts as a table to store the mtimes of all unreachable objects. This
prevents the need to store unreachable objects as loose as they age out
of the repository, and avoids the problem of loose object explosions.

Beyond avoiding loose object explosions, cruft packs also act as a more
efficient mechanism to store unreachable objects as they age out of a
repository. This is because pairs of similar unreachable objects serve
as delta bases for one another.

In 5b92477f89, the feature was introduced as experimental. Since then,
GitHub has been running these patches in every repository generating
hundreds of millions of cruft packs along the way. The feature is
battle-tested, and avoids many pathological cases such as above. Users
who either run `git gc` manually, or via `git maintenance` can benefit
from having cruft packs.

As such, enable cruft pack generation to take place by default (by
making `gc.cruftPacks` have the default of "true" rather than "false).

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:48 -07:00
Taylor Blau 05b9013b71 builtin/gc.c: ignore cruft packs with `--keep-largest-pack`
When cruft packs were implemented, we never adjusted the code for `git
gc`'s `--keep-largest-pack` and `gc.bigPackThreshold` to ignore cruft
packs. This option and configuration option share a common
implementation, but including cruft packs is wrong in both cases:

  - Running `git gc --keep-largest-pack` in a repository where the
    largest pack is the cruft pack itself will make it impossible for
    `git gc` to prune objects, since the cruft pack itself is kept.

  - The same is true for `gc.bigPackThreshold`, if the size of the cruft
    pack exceeds the limit set by the caller.

In the future, it is possible that `gc.bigPackThreshold` could be used
to write a separate cruft pack containing any new unreachable objects
that entered the repository since the last time a cruft pack was
written.

There are some complexities to doing so, mainly around handling
pruning objects that are in an existing cruft pack that is above the
threshold (which would either need to be rewritten, or else delay
pruning). Rewriting a substantially similar cruft pack isn't ideal, but
it is significantly better than the status-quo.

If users have large cruft packs that they don't want to rewrite, they
can mark them as `*.keep` packs. But in general, if a repository has a
cruft pack that is so large it is slowing down GC's, it should probably
be pruned anyway.

In the meantime, ignore cruft packs in the common implementation for
both of these options, and add a pair of tests to prevent any future
regressions here.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2023-04-18 14:56:47 -07:00
Ævar Arnfjörð Bjarmason 18d89fe25c docs: add and use include template for config/* includes
In b6a8d09f6d (gc docs: include the "gc.*" section from "config" in
"gc", 2019-04-07) the "git gc" documentation was made to include the
config/gc.txt in its "CONFIGURATION" section. We do that in several
other places, but "git gc" was the only one with a blurb above the
include to orient the reader.

We don't want readers to carefully scrutinize "git-config(1)" and
"git-gc(1)" looking for discrepancies, instead we should tell them
that the latter includes a part of the former.

This change formalizes that wording in two new templates to be
included, one for the "git gc" case where the entire section is
included from "git-config(1)", and another for when the inclusion of
"git-config(1)" follows discussion unique to that documentation. In
order to use that re-arrange the order of those being discussed in the
"git-merge(1)" documentation.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Reviewed-by: Matheus Tavares <matheus.bernardino@usp.br>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-09-07 09:46:05 -07:00
René Scharfe 378b51993a gc: simplify --cruft description
Remove duplicate "loose objects".

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-06-21 08:58:04 -07:00
Taylor Blau 5b92477f89 builtin/gc.c: conditionally avoid pruning objects via loose
Expose the new `git repack --cruft` mode from `git gc` via a new opt-in
flag. When invoked like `git gc --cruft`, `git gc` will avoid exploding
unreachable objects as loose ones, and instead create a cruft pack and
`.mtimes` file.

Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2022-05-26 15:48:26 -07:00
Martin von Zweigbergk fa9ab027ba docs: clarify that refs/notes/ do not keep the attached objects alive
`git help gc` contains this snippet:

  "[...] it will keep [..] objects referenced by the index,
  remote-tracking branches, notes saved by git notes under refs/notes/"

I had interpreted that as saying that the objects that notes were
attached to are kept, but that is not the case. Let's clarify the
documentation by moving out the part about git notes to a separate
sentence.

Signed-off-by: Martin von Zweigbergk <martinvonz@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2021-02-10 23:43:55 -08:00
Elijah Newren 9df53c5de6 Recommend git-filter-repo instead of git-filter-branch
filter-branch suffers from a deluge of disguised dangers that disfigure
history rewrites (i.e. deviate from the deliberate changes).  Many of
these problems are unobtrusive and can easily go undiscovered until the
new repository is in use.  This can result in problems ranging from an
even messier history than what led folks to filter-branch in the first
place, to data loss or corruption.  These issues cannot be backward
compatibly fixed, so add a warning to both filter-branch and its manpage
recommending that another tool (such as filter-repo) be used instead.

Also, update other manpages that referenced filter-branch.  Several of
these needed updates even if we could continue recommending
filter-branch, either due to implying that something was unique to
filter-branch when it applied more generally to all history rewriting
tools (e.g. BFG, reposurgeon, fast-import, filter-repo), or because
something about filter-branch was used as an example despite other more
commonly known examples now existing.  Reword these sections to fix
these issues and to avoid recommending filter-branch.

Finally, remove the section explaining BFG Repo Cleaner as an
alternative to filter-branch.  I feel somewhat bad about this,
especially since I feel like I learned so much from BFG that I put to
good use in filter-repo (which is much more than I can say for
filter-branch), but keeping that section presented a few problems:
  * In order to recommend that people quit using filter-branch, we need
    to provide them a recomendation for something else to use that
    can handle all the same types of rewrites.  To my knowledge,
    filter-repo is the only such tool.  So it needs to be mentioned.
  * I don't want to give conflicting recommendations to users
  * If we recommend two tools, we shouldn't expect users to learn both
    and pick which one to use; we should explain which problems one
    can solve that the other can't or when one is much faster than
    the other.
  * BFG and filter-repo have similar performance
  * All filtering types that BFG can do, filter-repo can also do.  In
    fact, filter-repo comes with a reimplementation of BFG named
    bfg-ish which provides the same user-interface as BFG but with
    several bugfixes and new features that are hard to implement in
    BFG due to its technical underpinnings.
While I could still mention both tools, it seems like I would need to
provide some kind of comparison and I would ultimately just say that
filter-repo can do everything BFG can, so ultimately it seems that it
is just better to remove that section altogether.

Signed-off-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-09-05 13:01:48 -07:00
Ævar Arnfjörð Bjarmason 0044f7700f gc docs: remove incorrect reference to gc.auto=0
The chance of a repository being corrupted due to a "gc" has nothing
to do with whether or not that "gc" was invoked via "gc --auto", but
whether there's other concurrent operations happening.

This is already noted earlier in the paragraph, so there's no reason
to suggest this here. The user can infer from the rest of the
documentation that "gc" will run automatically unless gc.auto=0 is
set, and we shouldn't confuse the issue by implying that "gc --auto"
is somehow more prone to produce corruption than a normal "gc".

Well, it is in the sense that a blocking "gc" would stop you from
doing anything else in *that* particular terminal window, but users
are likely to have another window, or to be worried about how
concurrent "gc" on a server might cause corruption.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08 17:01:10 +09:00
Ævar Arnfjörð Bjarmason daecbf2261 gc docs: clarify that "gc" doesn't throw away referenced objects
Amend the "NOTES" section to fix up wording that's been with us since
3ffb58be0a ("doc/git-gc: add a note about what is collected",
2008-04-23).

I can't remember when/where anymore (I think Freenode #Git), but at
some point I was having a conversation with someone who was convinced
that "gc" would prune things only referenced by e.g. refs/pull/*, and
pointed to this section as proof.

It turned out that they'd read the "branches and tags" wording here
and thought just refs/{heads,tags}/* and refs/remotes/* etc. would be
kept, which is what we enumerate explicitly.

So let's say "other refs", even though just above we say "objects that
are referenced anywhere in your repository".

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08 17:01:09 +09:00
Ævar Arnfjörð Bjarmason 22d4e3bd19 gc docs: downplay the usefulness of --aggressive
The existing "gc --aggressive" docs come just short of recommending to
users that they run it regularly. I've personally talked to many users
who've taken these docs as an advice to use this option, and have,
usually it's (mostly) a waste of time.

So let's clarify what it really does, and let the user draw their own
conclusions.

Let's also clarify the "The effects [...] are persistent" to
paraphrase a brief version of Jeff King's explanation at [1].

1. https://public-inbox.org/git/20190318235356.GK29661@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08 17:01:09 +09:00
Ævar Arnfjörð Bjarmason b6a8d09f6d gc docs: include the "gc.*" section from "config" in "gc"
Rather than duplicating the documentation for the various "gc" options
let's include the "gc" docs from git-config. They were mostly better
already, and now we don't have the same docs in two places with subtly
different wording.

In the cases where the git-gc(1) docs were saying something the "gc"
docs in git-config(1) didn't cover move the relevant section over to
the git-config(1) docs.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-04-08 17:01:09 +09:00
Ævar Arnfjörð Bjarmason fc559fbb54 gc docs: clean grammar for "gc.bigPackThreshold"
Clean up the grammar in the documentation for
"gc.bigPackThreshold". This documentation was added in 9806f5a7bf ("gc
--auto: exclude base pack if not enough mem to "repack -ad"",
2018-04-15).

Saying "the amount of memory estimated for" flows more smoothly than
the previous "the amount of memory is estimated not enough".

Suggested-by: Jeff King <peff@peff.net>
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 21:31:10 +09:00
Ævar Arnfjörð Bjarmason cf9cd771e1 gc docs: stop noting "repack" flags
Remove the mention of specific flags from the "gc" documentation, and
leave it at describing what we'll do instead. As seen in builtin/gc.c
we'll use various repack flags depending on what we detect we need to
do, so this isn't always accurate.

More importantly, a subsequent change is about to remove all this
documentation and replace it with an include of the gc.* docs in
git-config(1). By first changing this it's easier to reason about that
subsequent change.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 21:31:10 +09:00
Ævar Arnfjörð Bjarmason b11e8560cc gc docs: modernize the advice for manually running "gc"
The docs have been recommending that users need to run this manually,
but that hasn't been needed in practice for a long time except in
exceptional circumstances.

Let's instead have this reflect reality and say that most users don't
need to run this manually at all, while briefly describing the sorts
sort of cases where "gc" does need to be run manually.

Since we're recommending that users run this most of the and usually
don't need to tweak it, let's tone down the very prominent example of
the gc.auto=0 command. It's sufficient to point to the gc.auto
documentation below.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-24 21:31:10 +09:00
Junio C Hamano 443442ec71 Merge branch 'rd/gc-prune-doc-fix'
Doxfix.

* rd/gc-prune-doc-fix:
  docs/git-gc: fix typo "--prune=all" to "--prune=now"
2019-03-11 16:16:26 +09:00
Robert P. J. Day 716a5af812 docs/git-gc: fix typo "--prune=all" to "--prune=now"
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-03-03 18:35:54 +09:00
Derrick Stolee b8b4cb27e6 git-gc.txt: fix typo about gc.writeCommitGraph
Reported-by: Stefan Haller <stefan@haller-berlin.de>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2019-01-08 11:14:04 -08:00
Ævar Arnfjörð Bjarmason c7e8ce6d1d gc doc: mention the commit-graph in the intro
Explicitly mention in the intro that we may be writing supplemental
data structures such as the commit-graph during "gc", i.e. to call out
the "optimize" part of what this command does, it doesn't just
"collect garbage" as the "gc" name might imply.

Past changes have updated the intro to reflect new commands, such as
mentioning "worktree" in b586a96a39 ("gc.txt: more details about what
gc does", 2018-03-15). So let's elaborate on what was added in
d5d5d7b641 ("gc: automatically write commit-graph files", 2018-06-27).

See also
https://public-inbox.org/git/87tvm3go42.fsf@evledraar.gmail.com/ (follow-up
replies) for an on-list discussion about what "gc" does.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-10-11 15:40:27 +09:00
Derrick Stolee d5d5d7b641 gc: automatically write commit-graph files
The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, make it possible to
write the commit-graph file during standard garbage collection
operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command. Defaults to
false while the commit-graph feature matures. We specifically do not
want to have this on by default until the commit-graph feature is fully
integrated with history-modifying features like shallow clones.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-06-27 10:29:10 -07:00
Junio C Hamano 6b0f1d9c47 Merge branch 'nd/doc-header'
Doc formatting fix.

* nd/doc-header:
  doc: keep first level section header in upper case
2018-05-23 14:38:22 +09:00
Junio C Hamano 30b015bffe Merge branch 'nd/repack-keep-pack'
"git gc" in a large repository takes a lot of time as it considers
to repack all objects into one pack by default.  The command has
been taught to pretend as if the largest existing packfile is
marked with ".keep" so that it is left untouched while objects in
other packs and loose ones are repacked.

* nd/repack-keep-pack:
  pack-objects: show some progress when counting kept objects
  gc --auto: exclude base pack if not enough mem to "repack -ad"
  gc: handle a corner case in gc.bigPackThreshold
  gc: add gc.bigPackThreshold config
  gc: add --keep-largest-pack option
  repack: add --keep-pack option
  t7700: have closing quote of a test at the beginning of line
2018-05-23 14:38:14 +09:00
Junio C Hamano 278c251147 Merge branch 'sg/doc-gc-quote-mismatch-fix'
Doc formatting fix.

* sg/doc-gc-quote-mismatch-fix:
  docs/git-gc: fix minor rendering issue
2018-05-08 15:59:26 +09:00
Nguyễn Thái Ngọc Duy 76a8788c14 doc: keep first level section header in upper case
When formatted as a man page, 1st section header is always in upper
case even if we write it otherwise. Make all 1st section headers
uppercase to keep it close to the final output.

This does affect html since case is kept there, but I still think it's
a good idea to maintain a consistent style for 1st section headers.

Some sections perhaps should become second sections instead, where
case is kept, and for better organization. I will update if anyone has
suggestions about this.

While at there I also make some header more consistent (e.g. examples
vs example) and fix a couple minor things here and there.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-05-02 17:03:33 +09:00
SZEDER Gábor bed21a8ad6 docs/git-gc: fix minor rendering issue
An unwanted single quote character in the paragraph documenting the
'gc.aggressiveWindow' config variable prevented the name of that
config variable from being rendered correctly, ever since that piece
of docs was added in 0d7566a5ba (Add --aggressive option to 'git gc',
2007-05-09).

Remove that single quote.

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-18 12:07:23 +09:00
Nguyễn Thái Ngọc Duy 9806f5a7bf gc --auto: exclude base pack if not enough mem to "repack -ad"
pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
giant base pack to avoid this problem is also known for a long time.

Recent patches add an option to do just this, but it has to be either
configured or activated manually. This patch lets `git gc --auto`
activate this mode automatically when it thinks `repack -ad` will use
a lot of memory and start affecting the system due to swapping or
flushing OS cache.

gc --auto decides to do this based on an estimation of pack-objects
memory usage, which is quite accurate at least for the heap part, and
whether that fits in half of system memory (the assumption here is for
desktop environment where there are many other applications running).

This mechanism only kicks in if gc.bigBasePackThreshold is not configured.
If it is, it is assumed that the user already knows what they want.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Nguyễn Thái Ngọc Duy 55dfe13df9 gc: add gc.bigPackThreshold config
The --keep-largest-pack option is not very convenient to use because
you need to tell gc to do this explicitly (and probably on just a few
large repos).

Add a config key that enables this mode when packs larger than a limit
are found. Note that there's a slight behavior difference compared to
--keep-largest-pack: all packs larger than the threshold are kept, not
just the largest one.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Nguyễn Thái Ngọc Duy ae4e89e549 gc: add --keep-largest-pack option
This adds a new repack mode that combines everything into a secondary
pack, leaving the largest pack alone.

This could help reduce memory pressure. On linux-2.6.git, valgrind
massif reports 1.6GB heap in "pack all" case, and 535MB in "pack
all except the base pack" case. We save roughly 1GB memory by
excluding the base pack.

This should also lower I/O because we don't have to rewrite a giant
pack every time (e.g. for linux-2.6.git that's a 1.4GB pack file)..

PS. The use of string_list here seems overkill, but we'll need it in
the next patch...

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-04-16 13:52:29 +09:00
Nguyễn Thái Ngọc Duy b586a96a39 gc.txt: more details about what gc does
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2018-03-15 12:37:45 -07:00
Patrick Steinhardt 7e82388024 docs/git-gc: fix default value for `--aggressiveDepth`
In commit 07e7dbf0d (gc: default aggressive depth to 50, 2016-08-11),
the default aggressive depth of git-gc has been changed to 50. While
git-config(1) has been updated to represent the new default value,
git-gc(1) still mentions the old value. This patch fixes it.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Reviewed-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2017-02-24 09:59:12 -08:00
Junio C Hamano 647a1bcf14 Merge branch 'mm/gc-safety-doc' into maint
Doc update.

* mm/gc-safety-doc:
  git-gc.txt: expand discussion of races with other processes
2017-01-17 15:19:11 -08:00
Matt McCutchen f1350d0c12 git-gc.txt: expand discussion of races with other processes
In general, "git gc" may delete objects that another concurrent process
is using but hasn't created a reference to.  Git has some mitigations,
but they fall short of a complete solution.  Document this in the
git-gc(1) man page and add a reference from the documentation of the
gc.pruneExpire config variable.

Based on a write-up by Jeff King:

http://marc.info/?l=git&m=147922960131779&w=2

Signed-off-by: Matt McCutchen <matt@mattmccutchen.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-11-16 13:42:17 -08:00
Tom Russello ae9f6311e9 doc: change configuration variables format
This change configuration variables that where in italic style
to monospace font according to the guideline. It was obtained with

	grep '[[:alpha:]]*\.[[:alpha:]]*::$' config.txt | \
	sed -e 's/::$//' -e 's/\./\\\\./' | \
	xargs -iP perl -pi -e "s/\'P\'/\`P\`/g" ./*.txt

Signed-off-by: Tom Russello <tom.russello@grenoble-inp.org>
Signed-off-by: Erwan Mathoniere <erwan.mathoniere@grenoble-inp.org>
Signed-off-by: Samuel Groot <samuel.groot@grenoble-inp.org>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Reviewed-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2016-06-08 12:04:55 -07:00
Junio C Hamano d5d1e35ace Merge branch 'jc/doc-gc-prune-now'
"git gc" is safe to run anytime only because it has the built-in
grace period to protect young objects.  In order to run with no
grace period, the user must make sure that the repository is
quiescent.

* jc/doc-gc-prune-now:
  Documentation/gc: warn against --prune=<now>
2015-10-16 14:42:50 -07:00
Junio C Hamano fae1a901ec Documentation/gc: warn against --prune=<now>
"git gc" is safe to run anytime only because it has the built-in
grace period to protect objects that are created by other processes
that are waiting for ref updates to anchor them to the history.  In
order to run with no grace period, the user must make sure that the
repository is quiescent.

Reviewed-by: Matthieu Moy <Matthieu.Moy@grenoble-inp.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-10-14 13:48:39 -07:00
Nguyễn Thái Ngọc Duy da0005b885 *config.txt: stick to camelCase naming convention
This should improve readability. Compare "thislongname" and
"thisLongName". The following keys are left in unchanged. We can
decide what to do with them later.

 - am.keepcr
 - core.autocrlf .safecrlf .trustctime
 - diff.dirstat .noprefix
 - gitcvs.usecrlfattr
 - gui.blamehistoryctx .trustmtime
 - pull.twohead
 - receive.autogc
 - sendemail.signedoffbycc .smtpsslcertpath .suppresscc

Helped-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2015-03-13 22:13:46 -07:00
Nguyễn Thái Ngọc Duy 125f81461d gc --aggressive: make --depth configurable
When 1c192f3 (gc --aggressive: make it really aggressive - 2007-12-06)
made --depth=250 the default value, it didn't really explain the
reason behind, especially the pros and cons of --depth=250.

An old mail from Linus below explains it at length. Long story short,
--depth=250 is a disk saver and a performance killer. Not everybody
agrees on that aggressiveness. Let the user configure it.

    From: Linus Torvalds <torvalds@linux-foundation.org>
    Subject: Re: [PATCH] gc --aggressive: make it really aggressive
    Date: Thu, 6 Dec 2007 08:19:24 -0800 (PST)
    Message-ID: <alpine.LFD.0.9999.0712060803430.13796@woody.linux-foundation.org>
    Gmane-URL: http://article.gmane.org/gmane.comp.gcc.devel/94637

    On Thu, 6 Dec 2007, Harvey Harrison wrote:
    >
    > 7:41:25elapsed 86%CPU

    Heh. And this is why you want to do it exactly *once*, and then just
    export the end result for others ;)

    > -r--r--r-- 1 hharrison hharrison 324094684 2007-12-06 07:26 pack-1d46...pack

    But yeah, especially if you allow longer delta chains, the end result can
    be much smaller (and what makes the one-time repack more expensive is the
    window size, not the delta chain - you could make the delta chains longer
    with no cost overhead at packing time)

    HOWEVER.

    The longer delta chains do make it potentially much more expensive to then
    use old history. So there's a trade-off. And quite frankly, a delta depth
    of 250 is likely going to cause overflows in the delta cache (which is
    only 256 entries in size *and* it's a hash, so it's going to start having
    hash conflicts long before hitting the 250 depth limit).

    So when I said "--depth=250 --window=250", I chose those numbers more as
    an example of extremely aggressive packing, and I'm not at all sure that
    the end result is necessarily wonderfully usable. It's going to save disk
    space (and network bandwidth - the delta's will be re-used for the network
    protocol too!), but there are definitely downsides too, and using long
    delta chains may simply not be worth it in practice.

    (And some of it might just want to have git tuning, ie if people think
    that long deltas are worth it, we could easily just expand on the delta
    hash, at the cost of some more memory used!)

    That said, the good news is that working with *new* history will not be
    affected negatively, and if you want to be _really_ sneaky, there are ways
    to say "create a pack that contains the history up to a version one year
    ago, and be very aggressive about those old versions that we still want to
    have around, but do a separate pack for newer stuff using less aggressive
    parameters"

    So this is something that can be tweaked, although we don't really have
    any really nice interfaces for stuff like that (ie the git delta cache
    size is hardcoded in the sources and cannot be set in the config file, and
    the "pack old history more aggressively" involves some manual scripting
    and knowing how "git pack-objects" works rather than any nice simple
    command line switch).

    So the thing to take away from this is:
     - git is certainly flexible as hell
     - .. but to get the full power you may need to tweak things
     - .. happily you really only need to have one person to do the tweaking,
       and the tweaked end results will be available to others that do not
       need to know/care.

    And whether the difference between 320MB and 500MB is worth any really
    involved tweaking (considering the potential downsides), I really don't
    know. Only testing will tell.

			    Linus

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2014-03-31 10:26:24 -07:00
Junio C Hamano 05584b2a4e Merge branch 'nd/gc-lock-against-each-other'
* nd/gc-lock-against-each-other:
  gc: reject if another gc is running, unless --force is given
2013-09-04 12:35:34 -07:00
Nguyễn Thái Ngọc Duy 64a99eb476 gc: reject if another gc is running, unless --force is given
This may happen when `git gc --auto` is run automatically, then the
user, to avoid wait time, switches to a new terminal, keeps working
and `git gc --auto` is started again because the first gc instance has
not clean up the repository.

This patch tries to avoid multiple gc running, especially in --auto
mode. In the worst case, gc may be delayed 12 hours if a daemon reuses
the pid stored in gc.pid.

kill(pid, 0) support is added to MinGW port so it should work on
Windows too.

Signed-off-by: Nguyễn Thái Ngọc Duy <pclouds@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-08-09 09:10:05 -07:00
Michael Haggerty 61929404df git-gc.txt, git-reflog.txt: document new expiry options
Document the new values that can be used for expiry values where their
use makes sense:

* git reflog expire --expire=[all|never]
* git reflog expire --expire-unreachable=[all|never]
* git gc --prune=all

Other combinations aren't useful and therefore no documentation is
added (even though they are allowed):

* git gc --prune=never

  is redundant with "git gc --no-prune"

* git prune --expire=all

  is equivalent to providing no --expire option

* git prune --expire=never

  is a NOP

Signed-off-by: Michael Haggerty <mhagger@alum.mit.edu>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2013-04-18 09:30:15 -07:00
Jeff King 6cf378f0cb docs: stop using asciidoc no-inline-literal
In asciidoc 7, backticks like `foo` produced a typographic
effect, but did not otherwise affect the syntax. In asciidoc
8, backticks introduce an "inline literal" inside which markup
is not interpreted. To keep compatibility with existing
documents, asciidoc 8 has a "no-inline-literal" attribute to
keep the old behavior. We enabled this so that the
documentation could be built on either version.

It has been several years now, and asciidoc 7 is no longer
in wide use. We can now decide whether or not we want
inline literals on their own merits, which are:

  1. The source is much easier to read when the literal
     contains punctuation. You can use `master~1` instead
     of `master{tilde}1`.

  2. They are less error-prone. Because of point (1), we
     tend to make mistakes and forget the extra layer of
     quoting.

This patch removes the no-inline-literal attribute from the
Makefile and converts every use of backticks in the
documentation to an inline literal (they must be cleaned up,
or the example above would literally show "{tilde}" in the
output).

Problematic sites were found by grepping for '`.*[{\\]' and
examined and fixed manually. The results were then verified
by comparing the output of "html2text" on the set of
generated html pages. Doing so revealed that in addition to
making the source more readable, this patch fixes several
formatting bugs:

  - HTML rendering used the ellipsis character instead of
    literal "..." in code examples (like "git log A...B")

  - some code examples used the right-arrow character
    instead of '->' because they failed to quote

  - api-config.txt did not quote tilde, and the resulting
    HTML contained a bogus snippet like:

      <tt><sub></tt> foo <tt></sub>bar</tt>

    which caused some parsers to choke and omit whole
    sections of the page.

  - git-commit.txt confused ``foo`` (backticks inside a
    literal) with ``foo'' (matched double-quotes)

  - mentions of `A U Thor <author@example.com>` used to
    erroneously auto-generate a mailto footnote for
    author@example.com

  - the description of --word-diff=plain incorrectly showed
    the output as "[-removed-] and {added}", not "{+added+}".

  - using "prime" notation like:

      commit `C` and its replacement `C'`

    confused asciidoc into thinking that everything between
    the first backtick and the final apostrophe were meant
    to be inside matched quotes

  - asciidoc got confused by the escaping of some of our
    asterisks. In particular,

      `credential.\*` and `credential.<url>.\*`

    properly escaped the asterisk in the first case, but
    literally passed through the backslash in the second
    case.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2012-04-26 13:19:06 -07:00
Martin von Zweigbergk 7791a1d9b9 Documentation: use [verse] for SYNOPSIS sections
The SYNOPSIS sections of most commands that span several lines already
use [verse] to retain line breaks. Most commands that don't span
several lines seem not to use [verse]. In the HTML output, [verse]
does not only preserve line breaks, but also makes the section
indented, which causes a slight inconsistency between commands that
use [verse] and those that don't. Use [verse] in all SYNOPSIS sections
for consistency.

Also remove the blank lines from git-fetch.txt and git-rebase.txt to
align with the other man pages. In the case of git-rebase.txt, which
already uses [verse], the blank line makes the [verse] not apply to
the last line, so removing the blank line also makes the formatting
within the document more consistent.

While at it, add single quotes to 'git cvsimport' for consistency with
other commands.

Signed-off-by: Martin von Zweigbergk <martin.von.zweigbergk@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2011-07-06 14:26:26 -07:00
Jeff King 48bb914ed6 doc: drop author/documentation sections from most pages
The point of these sections is generally to:

  1. Give credit where it is due.

  2. Give the reader an idea of where to ask questions or
     file bug reports.

But they don't do a good job of either case. For (1), they
are out of date and incomplete. A much more accurate answer
can be gotten through shortlog or blame.  For (2), the
correct contact point is generally git@vger, and even if you
wanted to cc the contact point, the out-of-date and
incomplete fields mean you're likely sending to somebody
useless.

So let's drop the fields entirely from all manpages except
git(1) itself. We already point people to the mailing list
for bug reports there, and we can update the Authors section
to give credit to the major contributors and point to
shortlog and blame for more information.

Each page has a "This is part of git" footer, so people can
follow that to the main git manpage.
2011-03-11 10:59:16 -05:00
Junio C Hamano f29db856e7 Merge branch 'maint'
* maint:
  gitweb: Include links to feeds in HTML header only for '200 OK' response
  fsck docs: remove outdated and useless diagnostic
  userdiff: fix typo in ruby and python word regexes
  trace.c: mark file-local function static
  Fix typo in git-gc document.
2010-12-19 17:49:42 -08:00
Jiang Xin 4be0c35224 Fix typo in git-gc document.
The variable gc.packrefs for git-gc can be set to true, false and
"notbare", not "nobare".

Signed-off-by: Jiang Xin <jiangxin@ossxp.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-12-17 11:53:53 -08:00
Matthieu Moy 60109d0ef5 Change remote tracking to remote-tracking in non-trivial places
To complement the straightforward perl application in previous patch,
this adds a few manual changes.

Signed-off-by: Matthieu Moy <Matthieu.Moy@imag.fr>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-11-03 09:19:08 -07:00
Junio C Hamano ad9d8e8f0f Merge branch 'maint'
* maint:
  t0006: test timezone parsing
  rerere.txt: Document forget subcommand
  Documentation/git-gc.txt: add reference to githooks
2010-07-05 11:56:53 -07:00
Chris Packham 66bd8ab899 Documentation/git-gc.txt: add reference to githooks
This advertises the existence of the 'pre-auto-gc' hook and adds a cross
reference to where the hook is documented.

Signed-off-by: Chris Packham <judge.packham@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-07-02 10:41:20 -07:00
Junio C Hamano a660534e06 Merge branch 'jc/maint-no-reflog-expire-unreach-for-head'
* jc/maint-no-reflog-expire-unreach-for-head:
  reflog --expire-unreachable: special case entries in "HEAD" reflog
  more war on "sleep" in tests
  Document gc.<pattern>.reflogexpire variables

Conflicts:
	Documentation/config.txt
2010-05-21 04:02:18 -07:00
Junio C Hamano eb523a8d79 Document gc.<pattern>.reflogexpire variables
3cb22b8 (Per-ref reflog expiry configuration, 2008-06-15) added support
for setting the expiry parameters differently for different reflog, but
it was never documented.

Signed-off-by: Junio C Hamano <gitster@pobox.com>
2010-04-14 13:14:27 -07:00