1
0
Fork 0
mirror of https://github.com/git/git.git synced 2024-05-05 18:46:10 +02:00

grep/pcre2: use PCRE2_UTF even with ASCII patterns

compile_pcre2_pattern() currently uses the option PCRE2_UTF only for
patterns with non-ASCII characters.  Patterns with ASCII wildcards can
match non-ASCII strings, though.  Without that option PCRE2 mishandles
UTF-8 input, though -- it matches parts of multi-byte characters.  Fix
that by using PCRE2_UTF even for ASCII-only patterns.

This is a remake of the reverted ae39ba431a (grep/pcre2: fix an edge
case concerning ascii patterns and UTF-8 data, 2021-10-15).  The change
to the condition and the test are simplified and more targeted.

Original-patch-by: Hamza Mahfooz <someguy@effective-light.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This commit is contained in:
René Scharfe 2021-12-18 20:50:02 +01:00 committed by Junio C Hamano
parent e9d7761bb9
commit dc2c44fbb1
2 changed files with 7 additions and 1 deletions

2
grep.c
View File

@ -382,7 +382,7 @@ static void compile_pcre2_pattern(struct grep_pat *p, const struct grep_opt *opt
}
options |= PCRE2_CASELESS;
}
if (!opt->ignore_locale && is_utf8_locale() && has_non_ascii(p->pattern) &&
if (!opt->ignore_locale && is_utf8_locale() &&
!(!opt->ignore_case && (p->fixed || p->is_fixed)))
options |= (PCRE2_UTF | PCRE2_MATCH_INVALID_UTF);

View File

@ -123,4 +123,10 @@ test_expect_success GETTEXT_LOCALE,LIBPCRE2,PCRE2_MATCH_INVALID_UTF 'PCRE v2: gr
test_cmp invalid-0xe5 actual
'
test_expect_success GETTEXT_LOCALE,LIBPCRE2 'PCRE v2: grep non-literal ASCII from UTF-8' '
git grep --perl-regexp -h -o -e ll. file >actual &&
echo "lló" >expected &&
test_cmp expected actual
'
test_done