When I run
$ git grep -P "<pattern>"
I get the following error:
fatal: cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE
How can I install Git with PCRE support for macOS properly?
With homebrew, just use
brew reinstall --with-pcre2 git
It forces to build git from source instead of downloading the bottle, but ensures that the updates will be done with the pcre support.
Homebrew ships Git with a pre-built version (bottle) by default. You need to compile Git from source to enable PCRE support:
$ brew install pcre
$ export USE_LIBPCRE=yes
$ brew reinstall --build-from-source git
Now it should work as expected.
With Git 2.18 (Q2 2018) , the build option has evolved:
Git can be built to use either v1 or v2 of the PCRE library, and so
far, the build-time configuration USE_LIBPCRE=YesPlease
instructed
the build procedure to use v1, but now it means v2.
USE_LIBPCRE1
andUSE_LIBPCRE2
can be used to explicitly choose which version to use, as before.
See commit e6c531b, commit a363f98, commit a91b113 (11 Mar 2018) by Ævar Arnfjörð Bjarmason (avar
).
(Merged by Junio C Hamano -- gitster
-- in commit cac5351, 09 Apr 2018)
Makefile
: makeUSE_LIBPCRE=YesPlease
mean v2, not v1
Change the
USE_LIBPCRE
flag from being an alias forUSE_LIBPCRE1
to being an alias forUSE_LIBPCRE2
.When support for v2 was added in my 94da919 ("grep: add support for PCRE v2", 2017-06-01, Git v2.14.0-rc0) the existing
USE_LIBPCRE
flag was left as meaning v1, with a note that this would likely change in a future release.
That optional support for v2 first made it into Git version 2.14.0.The PCRE v2 support has been shown to be stable, and the upstream PCRE project is highly encouraging downstream users to move to v2, so it makes sense to give packagers of Git who haven't heard the news about PCRE v2 a further nudge to move to v2.
That PCRE v2 support does improve with Git 2.24 (Q4 2019)
See commit c581e4a (18 Aug 2019) by Beat Bolli (bbolli
).
Suggested-by: Johannes Schindelin (dscho
).
See commit 870eea8, commit 8a59998, commit 09872f6, commit 8a35b54, commit 685668f, commit 3448923, commit 04bef50 (26 Jul 2019), commit b65abca, commit 48de2a7, commit 45d1f37, commit 2575412, commit d316af0, commit 471dac5, commit f463beb, commit b14cf11 (01 Jul 2019), and commit 4457018, commit 4e2443b (27 Jun 2019) by Ævar Arnfjörð Bjarmason (avar
).
Suggested-by: Johannes Schindelin (dscho
).
(Merged by Junio C Hamano -- gitster
-- in commit a73f917, 11 Oct 2019)
grep
: use PCRE v2 for optimized fixed-string search
Bring back optimized fixed-string search for "
grep
", this time with PCRE v2 as an optional backend. As noted beofre, withkwset
we were slower than PCRE v1 and v2 JIT with thekwset
backend, so that optimization was counterproductive.This brings back the optimization for "
--fixed-strings
", without changing the semantics of having a NUL-byte in patterns.
As seen in previous commits in this series we could support it now, but I'd rather just leave that edge-case aside so we don't have one behavior or the other depending what "--fixed-strings
" backend we're using.
It makes the behavior harder to understand and document, and makes tests for the different backends more painful.This does change the behavior under non-C locales when "
log
"'s "--encoding
" option is used and the heystack/needle in the content/command-line doesn't have a matching encoding.
See the recent change in "t4210: skip more command-line encoding tests on MinGW" in this series (following this discussion). I think that's OK. We did nothing sensible before then (just compared raw bytes that had no hope of matching).
At least now the user will get some idea why their grep/log never matches in that edge case.
Test cases have been adjusted with With Git 2.25 (Q1 2020).
See commit e714b89 (30 Nov 2019) by Todd Zullinger (tmzullinger
).
See commit c74b3cb (26 Nov 2019) by Andreas Schwab (andreas-schwab
).
(Merged by Junio C Hamano -- gitster
-- in commit dac30e7, 10 Dec 2019)
t7812
: expect failure for grep -i with invalid UTF-8 dataSigned-off-by: Todd Zullinger
When the 'grep with invalid UTF-8 data' tests were added/adjusted in 8a5999838e ("
grep
: stess test PCRE v2 on invalid UTF-8 data", 2019-07-26, Git v2.24.0-rc0 -- merge listed in batch #8) and 870eea8166 ("grep
: do not enterPCRE2_UTF
mode on fixed matching", 2019-07-26, Git v2.24.0-rc0 -- merge listed in batch #8) they lacked a redirect which caused them to falsely succeed on most systems.The '
grep -i
' test failed on systems where JIT was disabled as it never reached the portion which was missing the redirect.A recent patch added the missing redirect and exposed the fact that the 'PCRE v2: grep non-ASCII from invalid UTF-8 data with
-i
' test fails regardless of whether JIT is enabled.Based on the final paragraph in in 870eea8166:
When grepping a non-ASCII fixed string.
This is a more general problem that's hard to fix, but we can at least fix the most common case of grepping for a fixed string without "-i
". I can't think of a reason for why we'd turn onPCRE2_UTF
when matching byte-for-byte like that.it seems that we don't expect that the case-insensitive grep will succeed.
Adjust the test to reflect that expectation.
And:
See commit 7187c7b (27 Nov 2019) by Ed Maste (emaste
).
(Merged by Junio C Hamano -- gitster
-- in commit b089e5e, 10 Dec 2019)
t4210
: skip i18n tests that don't work on FreeBSDSigned-off-by: Ed Maste
A number of t4210-log-i18n tests added in 4e2443b181 set
LC_ALL
to a UTF-8 locale(is_IS
.UTF-8) but then pass an invalid UTF-8 string to--grep
.
FreeBSD'sregcomp()
fails in this case withREG_ILLSEQ,
"illegal byte sequence,
" which git then passes todie()
:fatal: command line: '�': illegal byte sequence
When these tests were added the commit message stated:
| It's possible that this | test breaks the "`basic`" and "`extended`" backends on some systems that | are more anal than `glibc` about the encoding of locale issues with | POSIX functions that I can remember
which seems to be the case here.
Extend
test-lib.sh
to add aREGEX_ILLSEQ
prereq, set it on FreeBSD, and add!REGEX_ILLSEQ
to the two affected tests.
As FreeBSD is not the only platform whose regexp library reports a REG_ILLSEQ
error when fed invalid UTF-8, add logic to detect that automatically and skip the affected tests with Git 2.28 (Q3 2020).
See commit c4c2a96, commit aba8187 (18 May 2020) by Carlo Marcelo Arenas Belón (carenas
).
(Merged by Junio C Hamano -- gitster
-- in commit f4cec40, 09 Jun 2020)
t4210
: detectREG_ILLSEQ
dynamically and skip affected testsHelped-by: Eric Sunshine
Signed-off-by: Carlo Marcelo Arenas Belón
7187c7bbb8 ("
t4210
: skip i18n tests that don't work on FreeBSD", 2019-11-27, Git v2.25.0-rc0 -- merge listed in batch #5) adds aREG_ILLSEQ
prerequisite, and to do that copies the common branch in test-lib and expands it to include it in a special case for FreeBSD.Instead; test for it using a previously added extension to test-tool and use that, together with a function that identifies when regcomp/regexec will be called with broken patterns to avoid any test that would otherwise rely on undefined behaviour.
The description of the first test which wasn't accurate has been corrected, and the test rearranged for clarity, including a helper function that avoids overly long lines.
Only the affected engines will have their tests suppressed, also including "fixed" if the PCRE optimization that uses LIBPCRE2 since b65abcafc7 ("
grep
: use PCRE v2 for optimized fixed-string search", 2019-07-01, Git v2.24.0-rc0 -- merge listed in batch #8) is not available.
With Git 2.31 (Q1 2021), lose the debugging aid that may have been useful in the past, but no longer is, in the "grep
" codepaths.
See commit 15c9649 (26 Jan 2021) by Ævar Arnfjörð Bjarmason (avar
).
(Merged by Junio C Hamano -- gitster
-- in commit c9f94ab, 10 Feb 2021)
grep/log
: remove hidden--debug
and--grep-debug
optionsSigned-off-by: Ævar Arnfjörð Bjarmason
Remove the hidden "grep --debug" and "log --grep-debug" options added in 17bf35a ("
grep
: teach--debug
option to dump the parse tree", 2012-09-13, Git v1.8.0-rc0 -- merge).At the time these options seem to have been intended to go along with a documentation discussion and to help the author of relevant tests to perform ad-hoc debugging on them.
Reasons to want this gone:
They were never documented, and the only (rather trivial) use of them in our own codebase for testing is something I removed back in e01b4da ("
grep
: change non-ASCII-i
test to stop using--debug
", 2017-05-20, Git v2.14.0-rc0 -- merge listed in batch #5).Googling around doesn't show any in-the-wild uses I could dig up, and on the Git ML the only mentions after the original discussion seem to have been when they came up in unrelated diff contexts, or that test commit of mine.
An exception to that is c581e4a (
grep
: under --debug, 2019-08-18, Git v2.24.0-rc0 -- merge listed in batch #8) (grep: under--debug
, show whether PCRE JIT is enabled, 2019-08-18) where we added the ability to dump out when PCREv2 has the JIT in effect.
The combination of that and my earlier b65abca ("grep
: use PCRE v2 for optimized fixed-string search", 2019-07-01, Git v2.24.0-rc0 -- merge listed in batch #8) means Git prints this out in its most common in-the-wild configuration:$ git log --grep-debug --grep=foo --grep=bar --grep=baz --all-match pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 [all-match] (or pattern_body<body>foo (or pattern_body<body>bar pattern_body<body>baz ) ) $ git grep --debug \( -e foo --and -e bar \) --or -e baz pcre2_jit_on=1 pcre2_jit_on=1 pcre2_jit_on=1 (or (and patternfoo patternbar ) patternbaz )
I.e.
for each pattern we're considering for the and/or/--all-match
etc.
debugging we'll now diligently spew out another identical line saying whether the PCREv2 JIT is on or not.I think that nobody's complained about that rather glaringly obviously bad output says something about how much this is used, i.e.
it's not.The need for this debugging aid for the composed grep/log patterns seems to have passed, and the desire to dump the JIT config seems to have been another one-off around the time we had JIT-related issues on the PCREv2 codepath.
That the original author of this debugging facility seemingly hasn't noticed the bad output since then is probably some indicator.
With Git 2.31 (Q1 2021), the support for deprecated PCRE1 library has been dropped.
See commit 7599730, commit 0205bb1 (24 Jan 2021) by Ævar Arnfjörð Bjarmason (avar
).
(Merged by Junio C Hamano -- gitster
-- in commit 0199c68, 10 Feb 2021)
7599730b7e
:Remove support for v1 of the PCRE librarySigned-off-by: Ævar Arnfjörð Bjarmason
Remove support for using version 1 of the PCRE library.
Its use has been discouraged by upstream for a long time, and it's in a bugfix-only state.Anyone who was relying on v1 in particular got a nudge to move to v2 in e6c531b (
Makefile
: make USE_LIBPCRE=YesPlease mean v2, 2018-03-11, Git v2.18.0-rc0 -- merge listed in batch #1) (Makefile
: makeUSE_LIBPCRE=YesPlease
mean v2, not v1, 2018-03-11), which was first released as part of v2.18.0.With this the LIBPCRE2 test prerequisites is redundant to PCRE.
But I'm keeping it for self-documentation purposes, and to avoid conflict with other in-flight PCRE patches.I'm also not changing all of our own "pcre2" names to "pcre", i.e.
the inverse of 6d4b574 ("grep
: change internal pcre variable & function names to be pcre1", 2017-05-25, Git v2.14.0-rc0 -- merge listed in batch #5).
I don't see the point, and it makes the history/blame harder to read.
Maybe if there's ever a PCRE v3...
Still with Git 2.31 (Q1 2021), the support for invalid UTF-8 in PCRE2 has been updated.
See commit 95ca1f9, commit a4fea08 (24 Jan 2021) by Ævar Arnfjörð Bjarmason (avar
).
(Merged by Junio C Hamano -- gitster
-- in commit 59ace28, 10 Feb 2021)
grep/pcre2
: better support invalid UTF-8 haystacksSigned-off-by: Ævar Arnfjörð Bjarmason
Improve the support for invalid UTF-8 haystacks given a non-ASCII needle when using the PCREv2 backend.
This is a more complete fix for a bug I started to fix in 870eea8 ("
grep
: do not enterPCRE2_UTF
mode on fixed matching", 2019-07-26, Git v2.24.0-rc0 -- merge listed in batch #8), now that PCREv2 has thePCRE2_MATCH_INVALID_UTF
mode we can make use of it.This fixes the sort of case described in 8a59998 ("
grep
: stess test PCRE v2 on invalid UTF-8 data", 2019-07-26, Git v2.24.0-rc0 -- merge listed in batch #8), i.e.:
- The subject string is non-ASCII (e.g. "ævar")
- We're under a
is_utf8_locale()
, e.g."en_US
.UTF-8", not "C"- We are using
--ignore-case
, or we're a non-fixed patternIf those conditions were satisfied and we matched found non-valid UTF-8 data PCREv2 might bark on it, in practice this only happened under the JIT backend (turned on by default on most platforms).
Ultimately this fixes a "regression" in b65abca ("
grep
: use PCRE v2 for optimized fixed-string search", 2019-07-01, Git v2.24.0-rc0 -- merge listed in batch #8), I'm putting that in scare-quotes because before then we wouldn't properly support these complex case-folding, locale etc.
cases either, it just broke in different ways.There was a bug related to this the
PCRE2_NO_START_OPTIMIZE
flag fixed in PCREv2 10.36.
It can be worked around by setting thePCRE2_NO_START_OPTIMIZE
flag.
Let's do that in those cases, and add tests for the bug.
With Git 2.32 (Q2 2021), there are now updates to memory allocation code around the use of pcre2 library (so V2).
See commit c176035, commit cbe81e6, commit 8d12851, commit b76bf27, commit 797c359, commit a39b400, commit 588e4fb, commit 47eebd2, commit 1cfc5a8, commit 0ddf8ce (18 Feb 2021) by Ævar Arnfjörð Bjarmason (avar
).
(Merged by Junio C Hamano -- gitster
-- in commit 24119d9, 22 Mar 2021)
grep/pcre2
: move back to thread-only PCREv2 structuresSigned-off-by: Ævar Arnfjörð Bjarmason
Change the setup of the
"pcre2_general_context"
to happen per-thread incompile_pcre2_pattern()
instead of ingrep_init()
.This change brings it in line with how the rest of the
pcre2_*
members in thegrep_pat
structure are set up.As noted in the preceding commit the approach 513f2b0 ("
grep
: make PCRE2 aware of custom allocator", 2019-10-16, Git v2.24.0-rc1 -- merge listed in batch #11) took to allocate thepcre2_general_context
seems to have been initially based on a misunderstanding of how PCREv2 memory allocation works.The approach of creating a global context in
grep_init()
is just added complexity for almost zero gain.
On my system it's 24 bytes saved per-thread.
For comparison PCREv2 will then go on to allocate at least a kilobyte for its own thread-local state.As noted in 6d423dd ("
grep
: don't redundantly compile throwaway patterns under threading", 2017-05-25, Git v2.14.0-rc0 -- merge listed in batch #9) the grep code is intentionally not trying to micro-optimize allocations by e.g. sharing some PCREv2 structures globally, while making others thread-local.So let's remove this special case and make all of them thread-local again for simplicity.
With this change we could move thepcre2_{malloc,free}
functions around to live closer to their current use.See also the discussion in 94da919 ("
grep
: add support for PCRE v2", 2017-06-01, Git v2.14.0-rc0 -- merge listed in batch #9) about thread safety, and Johannes's comments to the effect that we should be doing what this patch is doing.
With Git 2.40 (Q1 2023), in an environment where dynamically generated code is prohibited to run (e.g. SELinux), failure to JIT pcre patterns is expected.
Fall back to interpreted execution in such a case.
See commit 50b6ad5 (31 Jan 2023) by Mathias Krause (mathiaskrause
).
(Merged by Junio C Hamano -- gitster
-- in commit 214242a, 15 Feb 2023)
grep
: fall back to interpreter if JIT memory allocation failsCc: Carlo Marcelo Arenas Belón
Signed-off-by: Mathias Krause
Under Linux systems with SELinux's '
deny_execmem
' or PaX's MPROTECT enabled, the allocation of PCRE2's JIT rwx memory may be prohibited, makingpcre2_jit_compile()
fail withPCRE2_ERROR_NOMEMORY
(-48):[user@fedora git]$ git grep -c PCRE2_JIT grep.c:1 [user@fedora git]$ # Enable SELinux's W^X policy [user@fedora git]$ sudo semanage boolean -m -1 deny_execmem [user@fedora git]$ # JIT memory allocation fails, breaking 'git grep' [user@fedora git]$ git grep -c PCRE2_JIT fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48'
Instead of failing hard in this case and making '
git grep
'(man) unusable on such systems, simply fall back to interpreter mode, leading to a much better user experience.As having a functional PCRE2 JIT compiler is a legitimate use case for performance reasons, we'll only do the fallback if the supposedly available JIT is found to be non-functional by attempting to JIT compile a very simple pattern.
If this fails, JIT is deemed to be non-functional and we do the interpreter fallback.
For all other cases, i.e.
the simple pattern can be compiled but the user provided cannot, we fail hard as we do now as the reason for the failure must be the pattern itself.
To aid users in helping themselves change the error message to include a hint about the '(*NO_JIT)
' prefix.
Also clip the pattern at 64 characters to ensure the hint will be seen by the user and not internally truncated by thedie()
function.