mirror/git - git - git.dotya.ml

mirror/git

mirror of https://github.com/git/git.git synced 2024-11-15 14:14:08 +01:00

Author	SHA1	Message	Date
Carlo Marcelo Arenas Belón	8991da6a38	grep: make sure NO_LIBPCRE1_JIT disable JIT in PCRE1 e87de7cab4 ("grep: un-break building with PCRE < 8.32", 2017-05-25) added a restriction for JIT support that is no longer needed after pcre_jit_exec() calls were removed. Reorganize the definitions in grep.h so that JIT support could be detected early and NO_LIBPCRE1_JIT could be used reliably to enforce JIT doesn't get used. Signed-off-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-26 11:37:01 -07:00
Beat Bolli	c581e4a749	grep: under --debug, show whether PCRE JIT is enabled This information is useful and not visible anywhere else, so show it. Signed-off-by: Beat Bolli <dev+git@drbeat.li> Suggested-by: Johannes Schindelin <johannes.schindelin@gmx.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-08-19 15:24:48 -07:00
Ævar Arnfjörð Bjarmason	870eea8166	grep: do not enter PCRE2_UTF mode on fixed matching As discussed in the last commit partially fix a bug introduced in b65abcafc7 ("grep: use PCRE v2 for optimized fixed-string search", 2019-07-01). Because PCRE v2, unlike kwset, validates its UTF-8 input we'd die on e.g.: fatal: pcre2_match failed with error code -22: UTF-8 error: isolated byte with 0x80 bit set When grepping a non-ASCII fixed string. This is a more general problem that's hard to fix, but we can at least fix the most common case of grepping for a fixed string without "-i". I can't think of a reason for why we'd turn on PCRE2_UTF when matching byte-for-byte like that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-26 13:56:40 -07:00
Ævar Arnfjörð Bjarmason	8a5999838e	grep: stess test PCRE v2 on invalid UTF-8 data Since my b65abcafc7 ("grep: use PCRE v2 for optimized fixed-string search", 2019-07-01) we've been dying on invalid UTF-8 data when grepping for fixed strings if the following are all true: * The subject string is non-ASCII (e.g. "ævar") * We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C" * We compiled with PCRE v2 * That PCRE v2 did not have JIT support The last of those is why this wasn't caught earlier, per pcre2jit(3): "unless PCRE2_NO_UTF_CHECK is set, a UTF subject string is tested for validity. In the interests of speed, these checks do not happen on the JIT fast path, and if invalid data is passed, the result is undefined." I.e. the subject being matched against our pattern was invalid, but we were lucky and getting away with it on the JIT path, but the non-JIT one is stricter. This patch does nothing to fix that, instead we sneak in support for fixed patterns starting with "(NO_JIT)", this disables the PCRE v2 jit with implicit fixed-string matching for testing, see pcre2syntax(3) the syntax. This is technically a change in behavior, but it's so obscure that I figured it was OK. We'd previously consider this an invalid regular expression as regcomp() would die on it, now we feed it to the PCRE v2 fixed-string path. I thought this was better than introducing yet another GIT_TEST_ environment variable. We're also relying on a behavior of PCRE v2 that technically could change, but I think the test coverage is worth dipping our toe into some somewhat undefined behavior. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-26 13:56:40 -07:00
Ævar Arnfjörð Bjarmason	09872f6418	grep: create a "is_fixed" member in "grep_pat" This change paves the way for later using this value the regex compile functions themselves. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-26 13:56:40 -07:00
Ævar Arnfjörð Bjarmason	8a35b540a9	grep: consistently use "p->fixed" in compile_regexp() At the start of this function we do: p->fixed = opt->fixed; It's less confusing to use that variable consistently that switch back & forth between the two. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-26 13:56:40 -07:00
Ævar Arnfjörð Bjarmason	685668faaa	grep: stop using a custom JIT stack with PCRE v1 Simplify the PCRE v1 code for the same reasons as for the PCRE v2 code in the last commit. Unlike with v2 we actually used the custom stack in v1, but let's use PCRE's built-in 32 KB one instead, since experience with v2 shows that's enough. Most distros are already using v2 as a default, and the underlying sljit code is the same. Unfortunately we can't just pass a NULL to pcre_jit_exec() as with pcre2_jit_match(). Unlike the v2 function it doesn't support that. Instead we need to use the fatter pcre_exec() if we'd like the same behavior. This will make things slightly slower than on the fast-path function, but it's OK since we care less about v1 performance these days since we have and recommend v2. Running a similar performance test as what I ran in fbaceaac47 ("grep: add support for the PCRE v1 JIT API", 2017-05-25) via: GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE1=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre/inst' ./run HEAD~ HEAD p7820-grep-engines.sh Gives us this, just the /perl/ results: Test HEAD~ HEAD --------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.19(0.67+0.52) 0.19(0.65+0.52) +0.0% 7820.7: perl grep '^how to' 0.19(0.78+0.44) 0.19(0.72+0.49) +0.0% 7820.11: perl grep '[how] to' 0.39(2.13+0.43) 0.40(2.10+0.46) +2.6% 7820.15: perl grep '(e.t[^ ]*\|v.ry) rare' 0.44(2.55+0.37) 0.45(2.47+0.41) +2.3% 7820.19: perl grep 'm(ú\|u)lt.b(æ\|y)te' 0.23(1.06+0.42) 0.22(1.03+0.43) -4.3% It will also implicitly re-enable UTF-8 validation for PCRE v1. As noted in [1] we now have cases as a result where PCRE v1 is more eager to error out. Subsequent patches will fix that for v2, and I think it's fair to tell v1 users "just upgrade" and not worry about that edge case for v1. 1. https://public-inbox.org/git/CAPUEsphZJ_Uv9o1-yDpjNLA_q-f7gWXz9g1gCY2pYAYN8ri40g@mail.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-26 13:56:40 -07:00
Ævar Arnfjörð Bjarmason	34489239d0	grep: stop "using" a custom JIT stack with PCRE v2 As reported in [1] the code I added in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) to use a custom JIT stack has never worked. It was incorrectly copy/pasted from code I added in fbaceaac47 ("grep: add support for the PCRE v1 JIT API", 2017-05-25), which did work. Thus our intention of starting with 1 byte of stack at a maximum of 1 MB didn't happen, we'd always use the 32 KB stack provided by PCRE v2's jit_machine_stack_exec()[2]. The reason I allocated a custom stack at all was this advice in pcrejit(3) (same in pcre2jit(3)): "By default, it uses 32KiB on the machine stack. However, some large or complicated patterns need more than this" Since we've haven't had any reports of users running into PCRE2_ERROR_JIT_STACKLIMIT in the wild I think we can safely assume that we can just use the library defaults instead and drop this code. This won't change with the wider use of PCRE v2 in ed0479ce3d ("Merge branch 'ab/no-kwset' into next", 2019-07-15), a fixed string search is not a "large or complicated pattern". For good measure I ran the performance test noted in 94da9193a6, although the command is simpler now due to my 0f50c8e32c ("Makefile: remove the NO_R_TO_GCC_LINKER flag", 2019-05-17): GIT_PERF_REPEAT_COUNT=30 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 USE_LIBPCRE2=Y CFLAGS=-O3 LIBPCREDIR=/home/avar/g/pcre2/inst' ./run HEAD~ HEAD p7820-grep-engines.sh Just the /perl/ results are: Test HEAD~ HEAD --------------------------------------------------------------------------------------- 7820.3: perl grep 'how.to' 0.17(0.27+0.65) 0.17(0.24+0.68) +0.0% 7820.7: perl grep '^how to' 0.16(0.23+0.66) 0.16(0.23+0.67) +0.0% 7820.11: perl grep '[how] to' 0.18(0.35+0.62) 0.18(0.33+0.65) +0.0% 7820.15: perl grep '(e.t[^ ]*\|v.ry) rare' 0.17(0.45+0.54) 0.17(0.49+0.50) +0.0% 7820.19: perl grep 'm(ú\|u)lt.b(æ\|y)te' 0.16(0.33+0.58) 0.16(0.29+0.62) +0.0% So, as expected there's no change, and running with valgrind reveals that we have fewer allocations now. As noted in [3] there are known regexes that will fail with the lower stack limit, the way GNU grep fixed it is interesting, although I believe the implementation is overly verbose, they could make PCRE v2 handle that gradual re-allocation, that's what min/max memory is for. So we might end up bringing this back, I'm more inclined to just kick such cases upstairs to PCRE maintainers as a bug, perhaps they'll add some overall "just allocate more then" flag to make this easier. In any case there's no functional change here, we didn't have a custom stack, so let's apply this first, we can always revert it later. 1. https://public-inbox.org/git/20190721194052.15440-1-carenas@gmail.com/ 2. I didn't really intend to start with 1 byte, looking at the PCRE v2 code again what happened is that I cargo-culted some of PCRE v2's own test code which was meant to test re-allocations. It's more sane to start with say 32 KB with a max of 1 MB, as pcre2grep.c does. 3. https://public-inbox.org/git/CAPUEspjj+fG8QDmf=bZXktfpLgkgiu34HTjKLhm-cmEE04FE-A@mail.gmail.com/ Reported-by: Carlo Marcelo Arenas Belón <carenas@gmail.com> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-26 13:56:40 -07:00
Ævar Arnfjörð Bjarmason	04bef50c01	grep: remove overly paranoid BUG(...) code Remove code that would trigger if pcre_config() or pcre2_config() was so broken that "do we have JIT?" wouldn't return a boolean. I added this code back in fbaceaac47 ("grep: add support for the PCRE v1 JIT API", 2017-05-25) and then as noted in f002532784 ("grep: print the pcre2_jit_on value", 2019-07-22) incorrectly copy/pasted some of it in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01). Let's just remove this code. Being this paranoid about the pcre2?_config() function itself being broken is crossing the line into unreasonable paranoia. Reported-by: Beat Bolli <dev+git@drbeat.li> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-26 13:56:40 -07:00
Ævar Arnfjörð Bjarmason	b65abcafc7	grep: use PCRE v2 for optimized fixed-string search Bring back optimized fixed-string search for "grep", this time with PCRE v2 as an optional backend. As noted in [1] with kwset we were slower than PCRE v1 and v2 JIT with the kwset backend, so that optimization was counterproductive. This brings back the optimization for "--fixed-strings", without changing the semantics of having a NUL-byte in patterns. As seen in previous commits in this series we could support it now, but I'd rather just leave that edge-case aside so we don't have one behavior or the other depending what "--fixed-strings" backend we're using. It makes the behavior harder to understand and document, and makes tests for the different backends more painful. This does change the behavior under non-C locales when "log"'s "--encoding" option is used and the heystack/needle in the content/command-line doesn't have a matching encoding. See the recent change in "t4210: skip more command-line encoding tests on MinGW" in this series. I think that's OK. We did nothing sensible before then (just compared raw bytes that had no hope of matching). At least now the user will get some idea why their grep/log never matches in that edge case. I could also support the PCRE v1 backend here, but that would make the code more complex. I'd rather aim for simplicity here and in future changes to the diffcore. We're not going to have someone who absolutely must have faster search, but for whom building PCRE v2 isn't acceptable. The difference between this series of commits and the current "master" is, using the same t/perf commands shown in the last commit: plain grep: Test origin/master HEAD ------------------------------------------------------------------------- 7821.1: fixed grep int 0.55(1.67+0.56) 0.41(0.98+0.60) -25.5% 7821.2: basic grep int 0.58(1.65+0.52) 0.41(0.96+0.57) -29.3% 7821.3: extended grep int 0.57(1.66+0.49) 0.42(0.93+0.60) -26.3% 7821.4: perl grep int 0.54(1.67+0.50) 0.43(0.88+0.65) -20.4% 7821.6: fixed grep uncommon 0.21(0.52+0.42) 0.16(0.24+0.51) -23.8% 7821.7: basic grep uncommon 0.20(0.49+0.45) 0.17(0.28+0.47) -15.0% 7821.8: extended grep uncommon 0.20(0.54+0.39) 0.16(0.25+0.50) -20.0% 7821.9: perl grep uncommon 0.20(0.58+0.36) 0.16(0.23+0.50) -20.0% 7821.11: fixed grep æ 0.35(1.24+0.43) 0.16(0.23+0.50) -54.3% 7821.12: basic grep æ 0.36(1.29+0.38) 0.16(0.20+0.54) -55.6% 7821.13: extended grep æ 0.35(1.23+0.44) 0.16(0.24+0.50) -54.3% 7821.14: perl grep æ 0.35(1.33+0.34) 0.16(0.28+0.46) -54.3% grep with -i: Test origin/master HEAD ---------------------------------------------------------------------------- 7821.1: fixed grep -i int 0.62(1.81+0.70) 0.47(1.11+0.64) -24.2% 7821.2: basic grep -i int 0.67(1.90+0.53) 0.46(1.07+0.62) -31.3% 7821.3: extended grep -i int 0.62(1.92+0.53) 0.53(1.12+0.58) -14.5% 7821.4: perl grep -i int 0.66(1.85+0.58) 0.45(1.10+0.59) -31.8% 7821.6: fixed grep -i uncommon 0.21(0.54+0.43) 0.17(0.20+0.55) -19.0% 7821.7: basic grep -i uncommon 0.20(0.52+0.45) 0.17(0.29+0.48) -15.0% 7821.8: extended grep -i uncommon 0.21(0.52+0.44) 0.17(0.26+0.50) -19.0% 7821.9: perl grep -i uncommon 0.21(0.53+0.44) 0.17(0.20+0.56) -19.0% 7821.11: fixed grep -i æ 0.26(0.79+0.44) 0.16(0.29+0.46) -38.5% 7821.12: basic grep -i æ 0.26(0.79+0.42) 0.16(0.20+0.54) -38.5% 7821.13: extended grep -i æ 0.26(0.84+0.39) 0.16(0.24+0.50) -38.5% 7821.14: perl grep -i æ 0.16(0.24+0.49) 0.17(0.25+0.51) +6.3% plain log: Test origin/master HEAD -------------------------------------------------------------------------------- 4221.1: fixed log --grep='int' 7.24(6.95+0.28) 7.20(6.95+0.18) -0.6% 4221.2: basic log --grep='int' 7.31(6.97+0.22) 7.20(6.93+0.21) -1.5% 4221.3: extended log --grep='int' 7.37(7.04+0.24) 7.22(6.91+0.25) -2.0% 4221.4: perl log --grep='int' 7.31(7.04+0.21) 7.19(6.89+0.21) -1.6% 4221.6: fixed log --grep='uncommon' 6.93(6.59+0.32) 7.04(6.66+0.37) +1.6% 4221.7: basic log --grep='uncommon' 6.92(6.58+0.29) 7.08(6.75+0.29) +2.3% 4221.8: extended log --grep='uncommon' 6.92(6.55+0.31) 7.00(6.68+0.31) +1.2% 4221.9: perl log --grep='uncommon' 7.03(6.59+0.33) 7.12(6.73+0.34) +1.3% 4221.11: fixed log --grep='æ' 7.41(7.08+0.28) 7.05(6.76+0.29) -4.9% 4221.12: basic log --grep='æ' 7.39(6.99+0.33) 7.00(6.68+0.25) -5.3% 4221.13: extended log --grep='æ' 7.34(7.00+0.25) 7.15(6.81+0.31) -2.6% 4221.14: perl log --grep='æ' 7.43(7.13+0.26) 7.01(6.60+0.36) -5.7% log with -i: Test origin/master HEAD ------------------------------------------------------------------------------------ 4221.1: fixed log -i --grep='int' 7.31(7.07+0.24) 7.23(7.00+0.22) -1.1% 4221.2: basic log -i --grep='int' 7.40(7.08+0.28) 7.19(6.92+0.20) -2.8% 4221.3: extended log -i --grep='int' 7.43(7.13+0.25) 7.27(6.99+0.21) -2.2% 4221.4: perl log -i --grep='int' 7.34(7.10+0.24) 7.10(6.90+0.19) -3.3% 4221.6: fixed log -i --grep='uncommon' 7.07(6.71+0.32) 7.11(6.77+0.28) +0.6% 4221.7: basic log -i --grep='uncommon' 6.99(6.64+0.28) 7.12(6.69+0.38) +1.9% 4221.8: extended log -i --grep='uncommon' 7.11(6.74+0.32) 7.10(6.77+0.27) -0.1% 4221.9: perl log -i --grep='uncommon' 6.98(6.60+0.29) 7.05(6.64+0.34) +1.0% 4221.11: fixed log -i --grep='æ' 7.85(7.45+0.34) 7.03(6.68+0.32) -10.4% 4221.12: basic log -i --grep='æ' 7.87(7.49+0.29) 7.06(6.69+0.31) -10.3% 4221.13: extended log -i --grep='æ' 7.87(7.54+0.31) 7.09(6.69+0.31) -9.9% 4221.14: perl log -i --grep='æ' 7.06(6.77+0.28) 6.91(6.57+0.31) -2.1% So as with e05b027627 ("grep: use PCRE v2 for optimized fixed-string search", 2019-06-26) there's a huge improvement in performance for "grep", but in "log" most of our time is spent elsewhere, so we don't notice it that much. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 14:33:14 -07:00
Ævar Arnfjörð Bjarmason	48de2a768c	grep: remove the kwset optimization A later change will replace this optimization with optimistic use of PCRE v2. I'm completely removing it as an intermediate step, as opposed to replacing it with PCRE v2, to demonstrate that no grep semantics depend on this (or any other) optimization for the fixed backend anymore. For now this is mostly (but not entirely) a performance regression, as shown by this hacky one-liner: for opt in '' ' -i' do GIT_PERF_7821_GREP_OPTS=$opt GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3 USE_LIBPCRE=YesPlease' ./run origin/master HEAD -- p7821-grep-engines-fixed.sh done && for opt in '' ' -i' do GIT_PERF_4221_LOG_OPTS=$opt GIT_PERF_REPEAT_COUNT=10 GIT_PERF_LARGE_REPO=~/g/linux GIT_PERF_MAKE_OPTS='-j8 CFLAGS=-O3 USE_LIBPCRE=YesPlease' ./run origin/master HEAD -- p4221-log-grep-engines-fixed.sh done Which produces: plain grep: Test origin/master HEAD ------------------------------------------------------------------------- 7821.1: fixed grep int 0.55(1.60+0.63) 0.82(3.11+0.51) +49.1% 7821.2: basic grep int 0.62(1.68+0.49) 0.85(3.02+0.52) +37.1% 7821.3: extended grep int 0.61(1.63+0.53) 0.91(3.09+0.44) +49.2% 7821.4: perl grep int 0.55(1.60+0.57) 0.41(0.93+0.57) -25.5% 7821.6: fixed grep uncommon 0.20(0.50+0.44) 0.35(1.27+0.42) +75.0% 7821.7: basic grep uncommon 0.20(0.49+0.45) 0.35(1.29+0.41) +75.0% 7821.8: extended grep uncommon 0.20(0.45+0.48) 0.35(1.25+0.44) +75.0% 7821.9: perl grep uncommon 0.20(0.53+0.41) 0.16(0.24+0.49) -20.0% 7821.11: fixed grep æ 0.35(1.27+0.40) 0.25(0.82+0.39) -28.6% 7821.12: basic grep æ 0.35(1.28+0.38) 0.25(0.75+0.44) -28.6% 7821.13: extended grep æ 0.36(1.21+0.46) 0.25(0.86+0.35) -30.6% 7821.14: perl grep æ 0.35(1.33+0.34) 0.16(0.26+0.47) -54.3% grep with -i: Test origin/master HEAD ----------------------------------------------------------------------------- 7821.1: fixed grep -i int 0.61(1.84+0.64) 1.11(4.12+0.64) +82.0% 7821.2: basic grep -i int 0.72(1.86+0.57) 1.15(4.48+0.49) +59.7% 7821.3: extended grep -i int 0.94(1.83+0.60) 1.53(4.12+0.58) +62.8% 7821.4: perl grep -i int 0.66(1.82+0.59) 0.55(1.08+0.58) -16.7% 7821.6: fixed grep -i uncommon 0.21(0.51+0.44) 0.44(1.74+0.34) +109.5% 7821.7: basic grep -i uncommon 0.21(0.55+0.41) 0.44(1.72+0.40) +109.5% 7821.8: extended grep -i uncommon 0.21(0.57+0.39) 0.42(1.64+0.45) +100.0% 7821.9: perl grep -i uncommon 0.21(0.48+0.48) 0.17(0.30+0.45) -19.0% 7821.11: fixed grep -i æ 0.25(0.73+0.45) 0.25(0.75+0.45) +0.0% 7821.12: basic grep -i æ 0.25(0.71+0.49) 0.26(0.77+0.44) +4.0% 7821.13: extended grep -i æ 0.25(0.75+0.44) 0.25(0.74+0.46) +0.0% 7821.14: perl grep -i æ 0.17(0.26+0.48) 0.16(0.20+0.52) -5.9% plain log: Test origin/master HEAD --------------------------------------------------------------------------------- 4221.1: fixed log --grep='int' 7.31(7.06+0.21) 8.11(7.85+0.20) +10.9% 4221.2: basic log --grep='int' 7.30(6.94+0.27) 8.16(7.89+0.19) +11.8% 4221.3: extended log --grep='int' 7.34(7.05+0.21) 8.08(7.76+0.25) +10.1% 4221.4: perl log --grep='int' 7.27(6.94+0.24) 7.05(6.76+0.25) -3.0% 4221.6: fixed log --grep='uncommon' 6.97(6.62+0.32) 7.86(7.51+0.30) +12.8% 4221.7: basic log --grep='uncommon' 7.05(6.69+0.29) 7.89(7.60+0.28) +11.9% 4221.8: extended log --grep='uncommon' 6.89(6.56+0.32) 7.99(7.66+0.24) +16.0% 4221.9: perl log --grep='uncommon' 7.02(6.66+0.33) 6.97(6.54+0.36) -0.7% 4221.11: fixed log --grep='æ' 7.37(7.03+0.33) 7.67(7.30+0.31) +4.1% 4221.12: basic log --grep='æ' 7.41(7.00+0.31) 7.60(7.28+0.26) +2.6% 4221.13: extended log --grep='æ' 7.35(6.96+0.38) 7.73(7.31+0.34) +5.2% 4221.14: perl log --grep='æ' 7.43(7.10+0.32) 6.95(6.61+0.27) -6.5% log with -i: Test origin/master HEAD ------------------------------------------------------------------------------------ 4221.1: fixed log -i --grep='int' 7.40(7.05+0.23) 8.66(8.38+0.20) +17.0% 4221.2: basic log -i --grep='int' 7.39(7.09+0.23) 8.67(8.39+0.20) +17.3% 4221.3: extended log -i --grep='int' 7.29(6.99+0.26) 8.69(8.31+0.26) +19.2% 4221.4: perl log -i --grep='int' 7.42(7.16+0.21) 7.14(6.80+0.24) -3.8% 4221.6: fixed log -i --grep='uncommon' 6.94(6.58+0.35) 8.43(8.04+0.30) +21.5% 4221.7: basic log -i --grep='uncommon' 6.95(6.62+0.31) 8.34(7.93+0.32) +20.0% 4221.8: extended log -i --grep='uncommon' 7.06(6.75+0.25) 8.32(7.98+0.31) +17.8% 4221.9: perl log -i --grep='uncommon' 6.96(6.69+0.26) 7.04(6.64+0.32) +1.1% 4221.11: fixed log -i --grep='æ' 7.92(7.55+0.33) 7.86(7.44+0.34) -0.8% 4221.12: basic log -i --grep='æ' 7.88(7.49+0.32) 7.84(7.46+0.34) -0.5% 4221.13: extended log -i --grep='æ' 7.91(7.51+0.32) 7.87(7.48+0.32) -0.5% 4221.14: perl log -i --grep='æ' 7.01(6.59+0.35) 6.99(6.64+0.28) -0.3% Some of those, as noted in [1] are because PCRE is faster at finding fixed strings. This looks bad for some engines, but in the next change we'll optimistically use PCRE v2 for all of these, so it'll look better. 1. https://public-inbox.org/git/87v9x793qi.fsf@evledraar.gmail.com/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 14:33:14 -07:00
Ævar Arnfjörð Bjarmason	45d1f37ccc	grep: drop support for \0 in --fixed-strings <pattern> Change "-f <file>" to not support patterns with a NUL-byte in them under --fixed-strings. We'll now only support these under "--perl-regexp" with PCRE v2. A previous change to grep's documentation changed the description of "-f <file>" to be vague enough as to not promise that this would work. By dropping support for this we make it a whole lot easier to move away from the kwset backend, which we'll do in a subsequent change. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 14:33:14 -07:00
Ævar Arnfjörð Bjarmason	25754125ce	grep: make the behavior for NUL-byte in patterns sane The behavior of "grep" when patterns contained a NUL-byte has always been haphazard, and has served the vagaries of the implementation more than anything else. A pattern containing a NUL-byte can only be provided via "-f <file>". Since pickaxe (log search) has no such flag the NUL-byte in patterns has only ever been supported by "grep" (and not "log --grep"). Since 9eceddeec6 ("Use kwset in grep", 2011-08-21) patterns containing "\0" were considered fixed. In 966be95549 ("grep: add tests to fix blind spots with \0 patterns", 2017-05-20) I added tests for this behavior. Change the behavior to do the obvious thing, i.e. don't silently discard a regex pattern and make it implicitly fixed just because they contain a NUL-byte. Instead die if the backend in question can't handle them, e.g. --basic-regexp is combined with such a pattern. This is desired because from a user's point of view it's the obvious thing to do. Whether we support BRE/ERE/Perl syntax is different from whether our implementation is limited by C-strings. These patterns are obscure enough that I think this behavior change is OK, especially since we never documented the old behavior. Doing this also makes it easier to replace the kwset backend with something else, since we'll no longer strictly need it for anything we can't easily use another fixed-string backend for. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 14:33:14 -07:00
Ævar Arnfjörð Bjarmason	d316af059d	grep tests: move binary pattern tests into their own file Move the tests for "-f <file>" where "<file>" contains a NUL byte pattern into their own file. I added most of these tests in 966be95549 ("grep: add tests to fix blind spots with \0 patterns", 2017-05-20). Whether a regex engine supports matching binary content is very different from whether it matches binary patterns. Since 2f8952250a ("regex: add regexec_buf() that can work on a non NUL-terminated string", 2016-09-21) we've required REG_STARTEND of our regex engines so we can match binary content, but only the PCRE v2 engine can sensibly match binary patterns. Since 9eceddeec6 ("Use kwset in grep", 2011-08-21) we've been punting patterns containing NUL-byte and considering them fixed, except in cases where "--ignore-case" is provided and they're non-ASCII, see 5c1ebcca4d ("grep/icase: avoid kwsset on literal non-ascii strings", 2016-06-25). Subsequent commits will change this behavior. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 14:33:14 -07:00
Ævar Arnfjörð Bjarmason	471dac5d2c	grep tests: move "grep binary" alongside the rest Move the "grep binary" test case added in aca20dd558 ("grep: add test script for binary file handling", 2010-05-22) so that it lives alongside the rest of the "grep" tests in t781. This would have left a gap in the t/700 namespace, so move a "filter-branch" test down, leaving the "t7010-setup.sh" test as the next one after that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 14:33:14 -07:00
Ævar Arnfjörð Bjarmason	f463beb805	grep: inline the return value of a function call used only once Since e944d9d932 ("grep: rewrite an if/else condition to avoid duplicate expression", 2016-06-25) the "ascii_only" variable has only been used once in compile_regexp(), let's just inline it there. This makes the code easier to read, and might make it marginally faster depending on compiler optimizations. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 14:33:14 -07:00
Ævar Arnfjörð Bjarmason	b14cf112e2	t4210: skip more command-line encoding tests on MinGW In 5212f91deb ("t4210: skip command-line encoding tests on mingw", 2014-07-17) the positive tests in this file were skipped. That left the negative tests that don't produce a match. An upcoming change to migrate the "fixed" backend of grep to PCRE v2 will cause these "log" commands to produce an error instead on MinGW. This is because the command-line on that platform implicitly has its encoding changed before being passed to git. See [1]. 1. https://public-inbox.org/git/nycvar.QRO.7.76.6.1907011515150.44@tvgsbejvaqbjf.bet/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-07-01 14:33:14 -07:00
Ævar Arnfjörð Bjarmason	44570188a0	grep: don't use PCRE2?_UTF8 with "log --encoding=<non-utf8>" Fix a bug introduced in 18547aacf5 ("grep/pcre: support utf-8", 2016-06-25) that was missed due to a blindspot in our tests, as discussed in the previous commit. I then blindly copied the same bug in 94da9193a6 ("grep: add support for PCRE v2", 2017-06-01) when adding the PCRE v2 code. We should not tell PCRE that we're processing UTF-8 just because we're dealing with non-ASCII. In the case of e.g. "log --encoding=<...>" under is_utf8_locale() the haystack might be in ISO-8859-1, and the needle might be in a non-UTF-8 encoding. Maybe we should be more strict here and die earlier? Should we also be converting the needle to the encoding in question, and failing if it's not a string that's valid in that encoding? Maybe. But for now matching this as non-UTF8 at least has some hope of producing sensible results, since we know that our default heuristic of assuming the text to be matched is in the user locale encoding isn't true when we've explicitly encoded it to be in a different encoding. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-28 09:11:09 -07:00
Ævar Arnfjörð Bjarmason	4e2443b181	log tests: test regex backends in "--encode=<enc>" tests Improve the tests added in 04deccda11 ("log: re-encode commit messages before grepping", 2013-02-11) to test the regex backends. Those tests never worked as advertised, due to the is_fixed() optimization in grep.c (which was in place at the time), and the needle in the tests being a fixed string. We'd thus always use the "fixed" backend during the tests, which would use the kwset() backend. This backend liberally accepts any garbage input, so invalid encodings would be silently accepted. In a follow-up commit we'll fix this bug, this test just demonstrates the existing issue. In practice this issue happened on Windows, see [1], but due to the structure of the existing tests & how liberal the kwset code is about garbage we missed this. Cover this blind spot by testing all our regex engines. The PCRE backend will spot these invalid encodings. It's possible that this test breaks the "basic" and "extended" backends on some systems that are more anal than glibc about the encoding of locale issues with POSIX functions that I can remember, but PCRE is more careful about the validation. 1. https://public-inbox.org/git/nycvar.QRO.7.76.6.1906271113090.44@tvgsbejvaqbjf.bet/ Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-28 09:11:09 -07:00
Junio C Hamano	8dca754b1e	The third batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-21 11:26:11 -07:00
Junio C Hamano	90d79d7191	Merge branch 'mo/clang-format-for-each-update' The list of for-each like macros used by clang-format has been updated. * mo/clang-format-for-each-update: clang-format: use git grep to generate the ForEachMacros list	2019-06-21 11:24:12 -07:00
Junio C Hamano	f9089e8491	Merge branch 'md/url-parse-harden' The URL decoding code has been updated to avoid going past the end of the string while parsing %-<hex>-<hex> sequence. * md/url-parse-harden: url: do not allow %00 to represent NUL in URLs url: do not read past end of buffer	2019-06-21 11:24:12 -07:00
Junio C Hamano	e694ea5e04	Merge branch 'an/ignore-doc-update' The description about slashes in gitignore patterns (used to indicate things like "anchored to this level only" and "only matches directories") has been revamped. * an/ignore-doc-update: gitignore.txt: make slash-rules more readable	2019-06-21 11:24:11 -07:00
Junio C Hamano	755793bf57	Merge branch 'ab/hash-object-doc' Doc update. * ab/hash-object-doc: hash-object doc: stop mentioning git-cvsimport	2019-06-21 11:24:11 -07:00
Junio C Hamano	88542ef306	Merge branch 'cm/send-email-document-req-modules' A doc update. * cm/send-email-document-req-modules: send-email: update documentation of required Perl modules	2019-06-21 11:24:10 -07:00
Junio C Hamano	ca02d3669f	Merge branch 'md/list-objects-filter-parse-msgfix' Make an end-user facing message localizable. * md/list-objects-filter-parse-msgfix: list-objects-filter-options: error is localizeable	2019-06-21 11:24:10 -07:00
Junio C Hamano	34032c4f8f	Merge branch 'md/list-objects-filter-memfix' The filter_data used in the list-objects-filter (which manages a lazily sparse clone repository) did not use the dynamic array API correctly---'nr' is supposed to point at one past the last element of the array in use. This has been corrected. * md/list-objects-filter-memfix: list-objects-filter: correct usage of ALLOC_GROW	2019-06-21 11:24:09 -07:00
Junio C Hamano	8867aa855e	Merge branch 'jt/partial-clone-missing-ref-delta-base' "git fetch" into a lazy clone forgot to fetch base objects that are necessary to complete delta in a thin packfile, which has been corrected. * jt/partial-clone-missing-ref-delta-base: t5616: cover case of client having delta base t5616: use correct flag to check object is missing index-pack: prefetch missing REF_DELTA bases t5616: refactor packfile replacement	2019-06-21 11:24:09 -07:00
Junio C Hamano	a41dad4330	Merge branch 'ml/userdiff-rust' The pattern "git diff/grep" use to extract funcname and words boundary for Rust has been added. * ml/userdiff-rust: userdiff: two simplifications of patterns for rust userdiff: add built-in pattern for rust	2019-06-21 11:24:08 -07:00
Junio C Hamano	a6a95cd1b4	The second batch Signed-off-by: Junio C Hamano <gitster@pobox.com>	2019-06-17 10:16:10 -07:00
Junio C Hamano	14f49b2058	Merge branch 'xl/record-partial-clone-origin' When creating a partial clone, the object filtering criteria is recorded for the origin of the clone, but this incorrectly used a hardcoded name "origin" to name that remote; it has been corrected to honor the "--origin <name>" option. * xl/record-partial-clone-origin: clone: respect user supplied origin name when setting up partial clone	2019-06-17 10:15:20 -07:00
Junio C Hamano	dedc046421	Merge branch 'pb/request-pull-verify-remote-ref' "git request-pull" learned to warn when the ref we ask them to pull from in the local repository and in the published repository are different. * pb/request-pull-verify-remote-ref: request-pull: warn if the remote object is not the same as the local one request-pull: quote regex metacharacters in local ref	2019-06-17 10:15:20 -07:00
Junio C Hamano	add59c4708	Merge branch 'mm/p4-unshelve-windows-fix' The command line to invoke a "git cat-file" command from inside "git p4" was not properly quoted to protect a caret and running a broken command on Windows, which has been corrected. * mm/p4-unshelve-windows-fix: p4 unshelve: fix "Not a valid object name HEAD0" on Windows	2019-06-17 10:15:19 -07:00
Junio C Hamano	d4fdeed006	Merge branch 'po/git-help-on-git-itself' "git help git" was hard to discover (well, at least for some people). * po/git-help-on-git-itself: Doc: git.txt: remove backticks from link and add git-scm.com/docs git.c: show usage for accessing the git(1) help page	2019-06-17 10:15:19 -07:00
Junio C Hamano	55b34f30c2	Merge branch 'es/first-contrib-tutorial' A new tutorial targetting specifically aspiring git-core developers. * es/first-contrib-tutorial: doc: add some nit fixes to MyFirstContribution documentation: add anchors to MyFirstContribution documentation: add tutorial for first contribution	2019-06-17 10:15:19 -07:00
Junio C Hamano	3dc47c4288	Merge branch 'bb/unicode-12.1-reiwa' Update to Unicode 12.1 width table. * bb/unicode-12.1-reiwa: unicode: update the width tables to Unicode 12.1	2019-06-17 10:15:18 -07:00
Junio C Hamano	e7ef93ba7a	Merge branch 'sw/git-p4-unshelve-branched-files' "git p4" update. * sw/git-p4-unshelve-branched-files: git-p4: allow unshelving of branched files	2019-06-17 10:15:18 -07:00
Junio C Hamano	a3e6b426b9	Merge branch 'js/fsmonitor-unflake' The data collected by fsmonitor was not properly written back to the on-disk index file, breaking t7519 tests occasionally, which has been corrected. * js/fsmonitor-unflake: mark_fsmonitor_valid(): mark the index as changed if needed fill_stat_cache_info(): prepare for an fsmonitor fix	2019-06-17 10:15:18 -07:00
Junio C Hamano	bdc81d15a2	Merge branch 'ds/topo-traversal-using-commit-graph' Prepare use of reachability index in topological walker that works on a range (A..B). * ds/topo-traversal-using-commit-graph: revision: keep topo-walk free of unintersting commits revision: use generation for A..B --topo-order queries	2019-06-17 10:15:17 -07:00
Junio C Hamano	2f475317f2	Merge branch 'bl/userdiff-octave' The pattern "git diff/grep" use to extract funcname and words boundary for Matlab has been extend to cover Octave, which is more or less equivalent. * bl/userdiff-octave: userdiff: fix grammar and style issues userdiff: add Octave	2019-06-17 10:15:17 -07:00
Junio C Hamano	94760948f1	Merge branch 'ba/clone-remote-submodules' "git clone --recurse-submodules" learned to set up the submodules to ignore commit object names recorded in the superproject gitlink and instead use the commits that happen to be at the tip of the remote-tracking branches from the get-go, by passing the new "--remote-submodules" option. * ba/clone-remote-submodules: clone: add `--remote-submodules` flag	2019-06-17 10:15:17 -07:00
Junio C Hamano	6e0b1c60ad	Merge branch 'vv/merge-squash-with-explicit-commit' "git merge --squash" is designed to update the working tree and the index without creating the commit, and this cannot be countermanded by adding the "--commit" option; the command now refuses to work when both options are given. * vv/merge-squash-with-explicit-commit: merge: refuse --commit with --squash	2019-06-17 10:15:17 -07:00
Junio C Hamano	3a54d80ac8	Merge branch 'js/bundle-verify-require-object-store' "git bundle verify" needs to see if prerequisite objects exist in the receiving repository, but the command did not check if we are in a repository upfront, which has been corrected. * js/bundle-verify-require-object-store: bundle verify: error out if called without an object database	2019-06-17 10:15:16 -07:00
Junio C Hamano	5b476dc2f1	Merge branch 'js/bisect-helper-check-get-oid-return-value' Code cleanup. * js/bisect-helper-check-get-oid-return-value: bisect--helper: verify HEAD could be parsed before continuing	2019-06-17 10:15:16 -07:00
Junio C Hamano	9b3897ab06	Merge branch 'jk/am-i-resolved-fix' "git am -i --resolved" segfaulted after trying to see a commit as if it were a tree, which has been corrected. * jk/am-i-resolved-fix: am: fix --interactive HEAD tree resolution am: drop tty requirement for --interactive am: read interactive input from stdin am: simplify prompt response handling	2019-06-17 10:15:15 -07:00
Junio C Hamano	86d87307c1	Merge branch 'jk/HEAD-symref-in-xfer-namespaces' The server side support for "git fetch" used to show incorrect value for the HEAD symbolic ref when the namespace feature is in use, which has been corrected. * jk/HEAD-symref-in-xfer-namespaces: upload-pack: strip namespace from symref data	2019-06-17 10:15:15 -07:00
Junio C Hamano	63b6b4b7e1	Merge branch 'ew/server-info-remove-crufts' "git update-server-info" used to leave stale packfiles in its output, which has been corrected. * ew/server-info-remove-crufts: server-info: do not list unlinked packs	2019-06-17 10:15:15 -07:00
Junio C Hamano	ac97dc4fa9	Merge branch 'es/grep-require-name-when-needed' More parameter validation. * es/grep-require-name-when-needed: grep: fail if call could output and name is null	2019-06-17 10:15:14 -07:00
Junio C Hamano	7df94cd1f6	Merge branch 'es/git-debugger-doc' Doc update. * es/git-debugger-doc: doc: hint about GIT_DEBUGGER in CodingGuidelines	2019-06-17 10:15:14 -07:00
Junio C Hamano	5d5c46b28c	Merge branch 'ds/object-info-for-prefetch-fix' Code cleanup and futureproof. * ds/object-info-for-prefetch-fix: sha1-file: split OBJECT_INFO_FOR_PREFETCH	2019-06-17 10:15:14 -07:00

1 2 3 4 5 ...

55951 Commits