Replace GIT_CONFIG_NOSYSTEM mechanism to decline from reading the
system-wide configuration file with GIT_CONFIG_SYSTEM that lets
users specify from which file to read the system-wide configuration
(setting it to an empty file would essentially be the same as
setting NOSYSTEM), and introduce GIT_CONFIG_GLOBAL to override the
per-user configuration in $HOME/.gitconfig.
* ps/config-global-override:
config: allow overriding of global and system configuration
config: unify code paths to get global config paths
config: rename `git_etc_config()`
Handling of "promisor packs" that allows certain objects to be
missing and lazily retrievable has been optimized (a bit).
* jk/promisor-optim:
revision: avoid parsing with --exclude-promisor-objects
lookup_unknown_object(): take a repository argument
is_promisor_object(): free tree buffer after parsing
Documentation updates, with unrelated comment updates, too.
* ab/usage-error-docs:
api docs: document that BUG() emits a trace2 error event
api docs: document BUG() in api-error-handling.txt
usage.c: don't copy/paste the same comment three times
The same "do not capitalize the first word" rule is applied to both
our patch titles and error messages, but the existing description
was fuzzy in two aspects.
* For error messages, it was not said that this was only about the
first word that begins the sentence.
* For both, it was not clear when a capital letter there was not an
error. We avoid capitalizing the first word when the only reason
you would capitalize it is because it happens to be the first
word in the sentence. If a proper noun, which is usually spelled
in capital letters, happens to come at the beginning of the
sentence, it should be kept in capital letters.
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When "git pack-objects" makes a literal copy of a part of existing
packfile using the reachability bitmaps, its update to the progress
meter was broken.
* jk/pack-objects-bitmap-progress-fix:
pack-objects: update "nr_seen" progress based on pack-reused count
A bit of code clean-up and a lot of test clean-up around userdiff
area.
* ab/userdiff-tests:
blame tests: simplify userdiff driver test
blame tests: don't rely on t/t4018/ directory
userdiff: remove support for "broken" tests
userdiff tests: list builtin drivers via test-tool
userdiff tests: explicitly test "default" pattern
userdiff: add and use for_each_userdiff_driver()
userdiff style: normalize pascal regex declaration
userdiff style: declare patterns with consistent style
userdiff style: re-order drivers in alphabetical order
The command-line completion script (in contrib/) had a couple of
references that would have given a warning under the "-u" (nounset)
option.
* vs/completion-with-set-u:
completion: audit and guard $GIT_* against unset use
"gitweb" learned "e-mail privacy" feature to redact strings that
look like e-mail addresses on various pages.
* gk/gitweb-redacted-email:
gitweb: add "e-mail privacy" feature to redact e-mail addresses
Clean-up codepaths that implements "git send-email --validate"
option and improves the message from it.
* ab/send-email-validate-errors:
git-send-email: improve --validate error output
git-send-email: refactor duplicate $? checks into a function
git-send-email: test full --validate output
Streamline the codepath to fix the UTF-8 encoding issues in the
argv[] and the prefix on macOS.
* tb/precompose-prefix-simplify:
macOS: precompose startup_info->prefix
precompose_utf8: make precompose_string_if_needed() public
A configuration variable has been added to force tips of certain
refs to be given a reachability bitmap.
* tb/pack-preferred-tips-to-give-bitmap:
builtin/pack-objects.c: respect 'pack.preferBitmapTips'
t/helper/test-bitmap.c: initial commit
pack-bitmap: add 'test_bitmap_commits()' helper
A NULL-dereference bug has been corrected in an error codepath in
"git for-each-ref", "git branch --list" etc.
* jk/ref-filter-segfault-fix:
ref-filter: fix NULL check for parse object failure
Correct documentation added in e544221d97a (trace2:
Documentation/technical/api-trace2.txt, 2019-02-22) to state that
calling BUG() also emits an "error" event. See ee4512ed481 (trace2:
create new combined trace facility, 2019-02-22) for the initial
implementation.
The BUG() function did not emit an event then however, that was only
changed later in 0a9dde4a04c (usage: trace2 BUG() invocations,
2021-02-05), that commit changed the code, but didn't update any of
the docs.
Let's also add a cross-reference from api-error-handling.txt.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When the BUG() function was added in d8193743e08 (usage.c: add BUG()
function, 2017-05-12) these docs added in 1f23cfe0ef5 (doc: document
error handling functions and conventions, 2014-12-03) were not
updated. Let's do that.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In ee4512ed481 (trace2: create new combined trace facility,
2019-02-22) we started with two copies of this comment,
0ee10fd1296 (usage: add trace2 entry upon warning(), 2020-11-23) added
a third. Let's instead add an earlier comment that applies to all
these mostly-the-same functions.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Finish the removal I started in 1108cea7f8e (tests: remove most uses
of test_i18ncmp, 2021-02-11). At that time the function wasn't removed
due to disruption with in-flight changes, remove the occurrences that
have landed since then.
As of writing this there are no test_i18ncmp uses between "master" and
"seen", so let's also remove the function to finally put it to rest.
Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When --exclude-promisor-objects is given, before traversing any objects
we iterate over all of the objects in any promisor packs, marking them
as UNINTERESTING and SEEN. We turn the oid we get from iterating the
pack into an object with parse_object(), but this has two problems:
- it's slow; we are zlib inflating (and reconstructing from deltas)
every byte of every object in the packfile
- it leaves the tree buffers attached to their structs, which means
our heap usage will grow to store every uncompressed tree
simultaneously. This can be gigabytes.
We can obviously fix the second by freeing the tree buffers after we've
parsed them. But we can observe that the function doesn't look at the
object contents at all! The only reason we call parse_object() is that
we need a "struct object" on which to set the flags. There are two
options here:
- we can look up just the object type via oid_object_info(), and then
call the appropriate lookup_foo() function
- we can call lookup_unknown_object(), which gives us an OBJ_NONE
struct (which will get auto-converted later by object_as_type() via
calls to lookup_commit(), etc).
The first one is closer to the current code, but we do pay the price to
look up the type for each object. The latter should be more efficient in
CPU, though it wastes a little bit of memory (the "unknown" object
structs are a union of all object types, so some of the structs are
bigger than they need to be). It also runs the risk of triggering a
latent bug in code that calls lookup_object() directly but isn't ready
to handle OBJ_NONE (such code would already be buggy, but we use
lookup_unknown_object() infrequently enough that it might be hiding).
I went with the second option here. I don't think the risk is high (and
we'd want to find and fix any such bugs anyway), and it should be more
efficient overall.
The new tests in p5600 show off the improvement (this is on git.git):
Test HEAD^ HEAD
-------------------------------------------------------------------------------
5600.5: count commits 0.37(0.37+0.00) 0.38(0.38+0.00) +2.7%
5600.6: count non-promisor commits 11.74(11.37+0.37) 0.04(0.03+0.00) -99.7%
The improvement is particularly big in this script because _every_
object in the newly-cloned partial repo is a promisor object. So after
marking them all, there's nothing left to traverse.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
All of the other lookup_foo() functions take a repository argument, but
lookup_unknown_object() was never converted, and it uses the_repository
internally. Let's fix that.
We could leave a wrapper that uses the_repository, but there aren't that
many calls, so we'll just convert them all. I looked briefly at each
site to see if we had a repository struct (besides the_repository) we
could pass, but none of them do (so this conversion to pass
the_repository is a pure noop in each case, though it does take us one
step closer to eventually getting rid of the_repository).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
To get the list of all promisor objects, we not only include all objects
in promisor packs, but also parse each of those objects to see which
objects they reference. After parsing a tree object, the tree->buffer
field will remain populated until we explicitly free it. So in a partial
clone of blob:none, for example, we are essentially reading every tree
in the repository (since they're all in the initial promisor pack), and
keeping all of their uncompressed contents in memory at once.
This patch frees the tree buffers after we've finished marking all of
their reachable objects. We shouldn't need to do this for any other
object type. While we are using some extra memory to store the structs,
no other object type stores the whole contents in its parsed form (we do
sometimes hold on to commit buffers, but less so these days due to
commit graphs, plus most commands which care about promisor objects turn
off the save_commit_buffer global).
Even for a moderate-sized repository like git.git, this patch drops the
peak heap (as measured by massif) for git-fsck from ~1.7GB to ~138MB.
Fsck is a good candidate for measuring here because it doesn't interact
with the promisor code except to call is_promisor_object(), so we can
isolate just this problem.
The added perf test shows only a tiny improvement on my machine for
git.git, since 1.7GB isn't enough to cause any real memory pressure:
Test HEAD^ HEAD
--------------------------------------------------------------------------------
5600.4: fsck 21.26(20.90+0.35) 20.84(20.79+0.04) -2.0%
With linux.git the absolute change is a bit bigger, though still a small
percentage:
Test HEAD^ HEAD
-----------------------------------------------------------------------------
5600.4: fsck 262.26(259.13+3.12) 254.92(254.62+0.29) -2.8%
I didn't have the patience to run it under massif with linux.git, but
it's probably on the order of about 14GB improvement, since that's the
sum of the sizes of all of the uncompressed trees (but still isn't
enough to create memory pressure on this particular machine, which has
64GB of RAM). Smaller machines would probably see a bigger effect on
runtime (and sadly our perf suite does not measure peak heap).
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In order to have git run in a fully controlled environment without any
misconfiguration, it may be desirable for users or scripts to override
global- and system-level configuration files. We already have a way of
doing this, which is to unset both HOME and XDG_CONFIG_HOME environment
variables and to set `GIT_CONFIG_NOGLOBAL=true`. This is quite kludgy,
and unsetting the first two variables likely has an impact on other
executables spawned by such a script.
The obvious way to fix this would be to introduce `GIT_CONFIG_NOGLOBAL`
as an equivalent to `GIT_CONFIG_NOSYSTEM`. But in the past, it has
turned out that this design is inflexible: we cannot test system-level
parsing of the git configuration in our test harness because there is no
way to change its location, so all tests run with `GIT_CONFIG_NOSYSTEM`
set.
Instead of doing the same mistake with `GIT_CONFIG_NOGLOBAL`, introduce
two new variables `GIT_CONFIG_GLOBAL` and `GIT_CONFIG_SYSTEM`:
- If unset, git continues to use the usual locations.
- If set to a specific path, we skip reading the normal
configuration files and instead take the path. By setting the path
to `/dev/null`, no configuration will be loaded for the respective
level.
This implements the usecase where we want to execute code in a sanitized
environment without any potential misconfigurations via `/dev/null`, but
is more flexible and allows for more usecases than simply adding
`GIT_CONFIG_NOGLOBAL`.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The new method uses the update_index counter, which isn't susceptible to clock
inaccuracies.
Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When serving a clone or fetch with bitmaps, after deciding which objects
need to be sent our "pack reuse" mechanism kicks in: we try to send
more-or-less verbatim a bunch of objects from the beginning of the
bitmapped packfile without even adding them to the to_pack.objects
array.
After deciding which objects will be in the "reused" portion, we update
nr_result to account for those, and then trigger display_progress() to
show the user (who is undoubtedly dazzled that we managed to enumerate
so many objects so quickly).
But then something confusing happens: the "Enumerating objects" progress
meter jumps _backwards_, counting up from zero the number of objects we
actually add into to_pack.objects.
This worked correctly once upon a time, but was broken in 5af050437a
(pack-objects: show some progress when counting kept objects,
2018-04-15), when the latter half of that progress meter switched to
using a separate nr_seen counter, rather than nr_result. Nobody noticed
for two reasons:
- prior to the pack-reuse fixes from a14aebeac3 (Merge branch
'jk/packfile-reuse-cleanup', 2020-02-14), the reuse code almost
never kicked in anyway
- the output looks _kind of_ correct. The "backwards" moment is hard
to catch, because we overwrite the old progress number with the new
one, and the larger number is displayed only for a second. So unless
you look at that exact second, you just see the much smaller value,
counting up to the number of non-reused objects (though of course if
you catch it in stderr, or look at GIT_TRACE_PACKET from a server
with bitmaps, you can see both values).
This smaller output isn't wrong per se, but isn't counting what we ever
intended to. We should give the user the whole number of objects we
considered (which, as per 5af050437a's original purpose, is already
_not_ a count of what goes into to_pack.objects). The follow-on
"Counting objects" meter shows the actual number of objects we feed into
that array.
We can easily fix this by bumping (and showing) nr_seen for the
pack-reused objects. When the included test is run without this patch,
the second pack-objects invocation produces "Enumerating objects: 1" to
show the one loose object, even though the resulting pack has hundreds
of objects in it. With it, we jump to "Enumerating objects: 674" after
deciding on reuse, and then "675" when we add in the loose object.
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
versions could be an empty string_list. In that case, versions->items is
NULL, and we shouldn't be trying to perform pointer arithmetic with it (as
that results in undefined behaviour).
Moreover we only use the results of this calculation once when calling
QSORT. Therefore we choose to skip creating relevant_entries and call
QSORT directly with our manipulated pointers (but only if there's data
requiring sorting). This lets us avoid abusing the string_list API,
and saves us from having to explain why this abuse is OK.
Finally, an assertion is added to make sure that write_tree() is called
with a valid offset.
This issue has probably existed since:
ee4012dcf9 (merge-ort: step 2 of tree writing -- function to create tree object, 2020-12-13)
But it only started occurring during tests since tests started using
merge-ort:
f3b964a07e (Add testing with merge-ort merge strategy, 2021-03-20)
For reference - here's the original UBSAN commit that implemented this
check, it sounds like this behaviour isn't actually likely to cause any
issues (but we might as well fix it regardless):
https://reviews.llvm.org/D67122
UBSAN output from t3404 or t5601:
merge-ort.c:2669:43: runtime error: applying zero offset to null pointer
#0 0x78bb53 in write_tree merge-ort.c:2669:43
#1 0x7856c9 in process_entries merge-ort.c:3303:2
#2 0x782317 in merge_ort_nonrecursive_internal merge-ort.c:3744:2
#3 0x77feef in merge_incore_nonrecursive merge-ort.c:3853:2
#4 0x6f6a5c in do_recursive_merge sequencer.c:640:3
#5 0x6f6a5c in do_pick_commit sequencer.c:2221:9
#6 0x6ef055 in single_pick sequencer.c:4814:9
#7 0x6ef055 in sequencer_pick_revisions sequencer.c:4867:10
#8 0x4fb392 in run_sequencer revert.c:225:9
#9 0x4fa5b0 in cmd_revert revert.c:235:8
#10 0x42abd7 in run_builtin git.c:453:11
#11 0x429531 in handle_builtin git.c:704:3
#12 0x4282fb in run_argv git.c:771:4
#13 0x4282fb in cmd_main git.c:902:19
#14 0x524b63 in main common-main.c:52:11
#15 0x7fc2ca340349 in __libc_start_main (/lib64/libc.so.6+0x24349)
#16 0x4072b9 in _start start.S:120
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior merge-ort.c:2669:43 in
Signed-off-by: Andrzej Hunt <ajrhunt@google.com>
Reviewed-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
There's two callsites which assemble global config paths, once in the
config loading code and once in the git-config(1) builtin. We're about
to implement a way to override global config paths via an environment
variable which would require us to adjust both sites.
Unify both code paths into a single `git_global_config()` function which
returns both paths for `~/.gitconfig` and the XDG config file. This will
make the subsequent patch which introduces the new envvar easier to
implement.
No functional changes are expected from this patch.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The `git_etc_gitconfig()` function retrieves the system-level path of
the configuration file. We're about to introduce a way to override it
via an environment variable, at which point the name of this function
would start to become misleading.
Rename the function to `git_system_config()` as a preparatory step.
While at it, the function is also refactored to pass memory ownership to
the caller. This is done to better match semantics of
`git_global_config()`, which is going to be introduced in the next
commit.
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
"git apply" now takes "--3way" and "--cached" at the same time, and
work and record results only in the index.
* jz/apply-3way-cached:
git-apply: allow simultaneous --cached and --3way options
The command line completion (in contrib/) has learned that
CHERRY_PICK_HEAD is a possible pseudo-ref.
* ab/complete-cherry-pick-head:
bash completion: complete CHERRY_PICK_HEAD
In send_fetch_request(), "object-format" is written directly to the file
descriptor, as opposed to the other arguments, which are buffered.
Buffer "object-format" as well. "object-format" must be buffered; in
particular, it must appear after "command=fetch" in the request.
This divergence was introduced in 4b831208bb ("fetch-pack: parse and
advertise the object-format capability", 2020-05-27), perhaps as an
oversight (the surrounding code at the point of this commit has already
been using a request buffer.)
Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The command-line completion script (in contrib/) had a couple of
references that would have given a warning under the "-u" (nounset)
option.
* vs/completion-with-set-u:
completion: audit and guard $GIT_* against unset use
"gitweb" learned "e-mail privacy" feature to redact strings that
look like e-mail addresses on various pages.
* gk/gitweb-redacted-email:
gitweb: add "e-mail privacy" feature to redact e-mail addresses
"git apply --3way" has always been "to fall back to 3-way merge
only when straight application fails". Swap the order of falling
back so that 3-way is always attempted first (only when the option
is given, of course) and then straight patch application is used as
a fallback when it fails.
* jz/apply-run-3way-first:
git-apply: try threeway first when "--3way" is used
Plug the ort merge backend throughout the rest of the system, and
start testing it as a replacement for the recursive backend.
* en/ort-readiness:
Add testing with merge-ort merge strategy
t6423: mark remaining expected failure under merge-ort as such
Revert "merge-ort: ignore the directory rename split conflict for now"
merge-recursive: add a bunch of FIXME comments documenting known bugs
merge-ort: write $GIT_DIR/AUTO_MERGE whenever we hit a conflict
t: mark several submodule merging tests as fixed under merge-ort
merge-ort: implement CE_SKIP_WORKTREE handling with conflicted entries
t6428: new test for SKIP_WORKTREE handling and conflicts
merge-ort: support subtree shifting
merge-ort: let renormalization change modify/delete into clean delete
merge-ort: have ll_merge() use a special attr_index for renormalization
merge-ort: add a special minimal index just for renormalization
merge-ort: use STABLE_QSORT instead of QSORT where required
Gitweb extracts content from the Git log and makes it accessible
over HTTP. As a result, e-mail addresses found in commits are
exposed to web crawlers and they may not respect robots.txt.
This can result in unsolicited messages.
Introduce an 'email-privacy' feature which redacts e-mail addresses
from the generated HTML content. Specifically, obscure addresses
retrieved from the the author/committer and comment sections of the
Git log. The feature is off by default.
This feature does not prevent someone from downloading the
unredacted commit log, e.g., by cloning the repository, and
extracting information from it. It aims to hinder the low-
effort, bulk collection of e-mail addresses by web crawlers.
Signed-off-by: Georgios Kontaxis <geko1702+commits@99rst.org>
Acked-by: Eric Wong <e@80x24.org>
Acked-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Add a diff driver for Scheme-like languages which recognizes top level
and local `define` forms, whether it is a function definition, binding,
syntax definition or a user-defined `define-xyzzy` form.
Also supports R6RS `library` forms, `module` forms along with class and
struct declarations used in Racket (PLT Scheme).
Alternate "def" syntax such as those in Gerbil Scheme are also
supported, like defstruct, defsyntax and so on.
The rationale for picking `define` forms for the hunk headers is because
it is usually the only significant form for defining the structure of
the program, and it is a common pattern for schemers to have local
function definitions to hide their visibility, so it is not only the top
level `define`'s that are of interest. Schemers also extend the language
with macros to provide their own define forms (for example, something
like a `define-test-suite`) which is also captured in the hunk header.
Since it is common practice to extend syntax with variants of a form
like `module+`, `class*` etc, those have been supported as well.
The word regex is a best-effort attempt to conform to R7RS[1] valid
identifiers, symbols and numbers.
[1] https://small.r7rs.org/attachment/r7rs.pdf (section 2.1)
Signed-off-by: Atharva Raykar <raykar.ath@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>