11

When I run

$ git grep -P "<pattern>"

I get the following error:

fatal: cannot use Perl-compatible regexes when not compiled with USE_LIBPCRE

How can I install Git with PCRE support for macOS properly?

pkamb
  • 33,281
  • 23
  • 160
  • 191
ldiqual
  • 15,015
  • 6
  • 52
  • 90

3 Answers3

25

With homebrew, just use

brew reinstall --with-pcre2 git

It forces to build git from source instead of downloading the bottle, but ensures that the updates will be done with the pcre support.

Gaëtan Lehmann
  • 868
  • 1
  • 10
  • 11
  • Worked for me on MacOS 10.12, Git 2.12. While both valid, I prefer this more concise fix over Loïs' answer because it will survive a brew upgrade of Git. – Casey Watson Mar 20 '17 at 03:51
  • 1
    Now in March 2018: `Warning: git: --with-pcre was deprecated; using --with-pcre2 instead!` So you'll have to `brew reinstall --with-pcre2 git` – akhaku Mar 05 '18 at 22:29
  • Updated. Thanks! – Gaëtan Lehmann Mar 09 '18 at 13:53
  • 2
    Didn't work for me. It said this option is invalid. `brew upgrade git` did the trick. – Guy Jan 14 '19 at 08:33
  • `brew upgrade git` worked for me as well. To add a bit more information, I was using git version 2.19.1, which didn't support `--with-pcre2`. During the upgrade to `2.22.0_1`, pcre2 support (ver 2-10.33) was added. That said, after the upgrade, the error is still shown. I'll update this when I find a working solution. – Matthew Setter Jul 15 '19 at 08:28
6

Homebrew ships Git with a pre-built version (bottle) by default. You need to compile Git from source to enable PCRE support:

$ brew install pcre
$ export USE_LIBPCRE=yes
$ brew reinstall --build-from-source git

Now it should work as expected.

ldiqual
  • 15,015
  • 6
  • 52
  • 90
  • FYI, this didn't work on my OS X 10.10.2 system—git still complains it wasn't compiled with pcre. But Gaëtan Lehmann's more direct answer (`brew reinstall --with-pcre git`) did. – jacobsa Mar 31 '15 at 21:59
  • 1
    Note that you will have to run this each time you `brew upgrade`. Go with Gaëtan's answer. – Casey Watson Mar 20 '17 at 03:53
2

With Git 2.18 (Q2 2018) , the build option has evolved:

Git can be built to use either v1 or v2 of the PCRE library, and so far, the build-time configuration USE_LIBPCRE=YesPlease instructed the build procedure to use v1, but now it means v2.

USE_LIBPCRE1 and USE_LIBPCRE2 can be used to explicitly choose which version to use, as before.

See commit e6c531b, commit a363f98, commit a91b113 (11 Mar 2018) by Ævar Arnfjörð Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit cac5351, 09 Apr 2018)

Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1

Change the USE_LIBPCRE flag from being an alias for USE_LIBPCRE1 to being an alias for USE_LIBPCRE2.

When support for v2 was added in my 94da919 ("grep: add support for PCRE v2", 2017-06-01, Git v2.14.0-rc0) the existing USE_LIBPCRE flag was left as meaning v1, with a note that this would likely change in a future release.
That optional support for v2 first made it into Git version 2.14.0.

The PCRE v2 support has been shown to be stable, and the upstream PCRE project is highly encouraging downstream users to move to v2, so it makes sense to give packagers of Git who haven't heard the news about PCRE v2 a further nudge to move to v2.


That PCRE v2 support does improve with Git 2.24 (Q4 2019)

See commit c581e4a (18 Aug 2019) by Beat Bolli (bbolli).
Suggested-by: Johannes Schindelin (dscho).
See commit 870eea8, commit 8a59998, commit 09872f6, commit 8a35b54, commit 685668f, commit 3448923, commit 04bef50 (26 Jul 2019), commit b65abca, commit 48de2a7, commit 45d1f37, commit 2575412, commit d316af0, commit 471dac5, commit f463beb, commit b14cf11 (01 Jul 2019), and commit 4457018, commit 4e2443b (27 Jun 2019) by Ævar Arnfjörð Bjarmason (avar).
Suggested-by: Johannes Schindelin (dscho).
(Merged by Junio C Hamano -- gitster -- in commit a73f917, 11 Oct 2019)

grep: use PCRE v2 for optimized fixed-string search

Bring back optimized fixed-string search for "grep", this time with PCRE v2 as an optional backend. As noted beofre, with kwset we were slower than PCRE v1 and v2 JIT with the kwset backend, so that optimization was counterproductive.

This brings back the optimization for "--fixed-strings", without changing the semantics of having a NUL-byte in patterns.
As seen in previous commits in this series we could support it now, but I'd rather just leave that edge-case aside so we don't have one behavior or the other depending what "--fixed-strings" backend we're using.
It makes the behavior harder to understand and document, and makes tests for the different backends more painful.

This does change the behavior under non-C locales when "log"'s "--encoding" option is used and the heystack/needle in the content/command-line doesn't have a matching encoding.
See the recent change in "t4210: skip more command-line encoding tests on MinGW" in this series (following this discussion). I think that's OK. We did nothing sensible before then (just compared raw bytes that had no hope of matching).
At least now the user will get some idea why their grep/log never matches in that edge case.


Test cases have been adjusted with With Git 2.25 (Q1 2020).

See commit e714b89 (30 Nov 2019) by Todd Zullinger (tmzullinger).
See commit c74b3cb (26 Nov 2019) by Andreas Schwab (andreas-schwab).
(Merged by Junio C Hamano -- gitster -- in commit dac30e7, 10 Dec 2019)

t7812: expect failure for grep -i with invalid UTF-8 data

Signed-off-by: Todd Zullinger

When the 'grep with invalid UTF-8 data' tests were added/adjusted in 8a5999838e ("grep: stess test PCRE v2 on invalid UTF-8 data", 2019-07-26, Git v2.24.0-rc0 -- merge listed in batch #8) and 870eea8166 ("grep: do not enter PCRE2_UTF mode on fixed matching", 2019-07-26, Git v2.24.0-rc0 -- merge listed in batch #8) they lacked a redirect which caused them to falsely succeed on most systems.

The 'grep -i' test failed on systems where JIT was disabled as it never reached the portion which was missing the redirect.

A recent patch added the missing redirect and exposed the fact that the 'PCRE v2: grep non-ASCII from invalid UTF-8 data with -i' test fails regardless of whether JIT is enabled.

Based on the final paragraph in in 870eea8166:

When grepping a non-ASCII fixed string.
This is a more general problem that's hard to fix, but we can at least fix the most common case of grepping for a fixed string without "-i". I can't think of a reason for why we'd turn on PCRE2_UTF when matching byte-for-byte like that.

it seems that we don't expect that the case-insensitive grep will succeed.

Adjust the test to reflect that expectation.

And:

See commit 7187c7b (27 Nov 2019) by Ed Maste (emaste).
(Merged by Junio C Hamano -- gitster -- in commit b089e5e, 10 Dec 2019)

t4210: skip i18n tests that don't work on FreeBSD

Signed-off-by: Ed Maste

A number of t4210-log-i18n tests added in 4e2443b181 set LC_ALL to a UTF-8 locale (is_IS.UTF-8) but then pass an invalid UTF-8 string to --grep.
FreeBSD's regcomp() fails in this case with REG_ILLSEQ, "illegal byte sequence," which git then passes to die():

fatal: command line: '�': illegal byte sequence

When these tests were added the commit message stated:

| It's possible that this  
| test breaks the "`basic`" and "`extended`" backends on some systems that  
| are more anal than `glibc` about the encoding of locale issues with  
| POSIX functions that I can remember

which seems to be the case here.

Extend test-lib.sh to add a REGEX_ILLSEQ prereq, set it on FreeBSD, and add !REGEX_ILLSEQ to the two affected tests.


As FreeBSD is not the only platform whose regexp library reports a REG_ILLSEQ error when fed invalid UTF-8, add logic to detect that automatically and skip the affected tests with Git 2.28 (Q3 2020).

See commit c4c2a96, commit aba8187 (18 May 2020) by Carlo Marcelo Arenas Belón (carenas).
(Merged by Junio C Hamano -- gitster -- in commit f4cec40, 09 Jun 2020)

t4210: detect REG_ILLSEQ dynamically and skip affected tests

Helped-by: Eric Sunshine
Signed-off-by: Carlo Marcelo Arenas Belón

7187c7bbb8 ("t4210: skip i18n tests that don't work on FreeBSD", 2019-11-27, Git v2.25.0-rc0 -- merge listed in batch #5) adds a REG_ILLSEQ prerequisite, and to do that copies the common branch in test-lib and expands it to include it in a special case for FreeBSD.

Instead; test for it using a previously added extension to test-tool and use that, together with a function that identifies when regcomp/regexec will be called with broken patterns to avoid any test that would otherwise rely on undefined behaviour.

The description of the first test which wasn't accurate has been corrected, and the test rearranged for clarity, including a helper function that avoids overly long lines.

Only the affected engines will have their tests suppressed, also including "fixed" if the PCRE optimization that uses LIBPCRE2 since b65abcafc7 ("grep: use PCRE v2 for optimized fixed-string search", 2019-07-01, Git v2.24.0-rc0 -- merge listed in batch #8) is not available.


With Git 2.31 (Q1 2021), lose the debugging aid that may have been useful in the past, but no longer is, in the "grep" codepaths.

See commit 15c9649 (26 Jan 2021) by Ævar Arnfjörð Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit c9f94ab, 10 Feb 2021)

grep/log: remove hidden --debug and --grep-debug options

Signed-off-by: Ævar Arnfjörð Bjarmason

Remove the hidden "grep --debug" and "log --grep-debug" options added in 17bf35a ("grep: teach --debug option to dump the parse tree", 2012-09-13, Git v1.8.0-rc0 -- merge).

At the time these options seem to have been intended to go along with a documentation discussion and to help the author of relevant tests to perform ad-hoc debugging on them.

Reasons to want this gone:

  1. They were never documented, and the only (rather trivial) use of them in our own codebase for testing is something I removed back in e01b4da ("grep: change non-ASCII -i test to stop using --debug", 2017-05-20, Git v2.14.0-rc0 -- merge listed in batch #5).

  2. Googling around doesn't show any in-the-wild uses I could dig up, and on the Git ML the only mentions after the original discussion seem to have been when they came up in unrelated diff contexts, or that test commit of mine.

  3. An exception to that is c581e4a (grep: under --debug, 2019-08-18, Git v2.24.0-rc0 -- merge listed in batch #8) (grep: under --debug, show whether PCRE JIT is enabled, 2019-08-18) where we added the ability to dump out when PCREv2 has the JIT in effect.
    The combination of that and my earlier b65abca ("grep: use PCRE v2 for optimized fixed-string search", 2019-07-01, Git v2.24.0-rc0 -- merge listed in batch #8) means Git prints this out in its most common in-the-wild configuration:

    $ git log  --grep-debug --grep=foo --grep=bar --grep=baz --all-match
    pcre2_jit_on=1
    pcre2_jit_on=1
    pcre2_jit_on=1
    [all-match]
    (or
     pattern_body<body>foo
     (or
      pattern_body<body>bar
      pattern_body<body>baz
     )
    )
    
    $ git grep --debug \( -e foo --and -e bar \) --or -e baz
    pcre2_jit_on=1
    pcre2_jit_on=1
    pcre2_jit_on=1
    (or
     (and
      patternfoo
      patternbar
     )
     patternbaz
    )
    

I.e.
for each pattern we're considering for the and/or/--all-match etc.
debugging we'll now diligently spew out another identical line saying whether the PCREv2 JIT is on or not.

I think that nobody's complained about that rather glaringly obviously bad output says something about how much this is used, i.e.
it's not.

The need for this debugging aid for the composed grep/log patterns seems to have passed, and the desire to dump the JIT config seems to have been another one-off around the time we had JIT-related issues on the PCREv2 codepath.
That the original author of this debugging facility seemingly hasn't noticed the bad output since then is probably some indicator.


With Git 2.31 (Q1 2021), the support for deprecated PCRE1 library has been dropped.

See commit 7599730, commit 0205bb1 (24 Jan 2021) by Ævar Arnfjörð Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit 0199c68, 10 Feb 2021)

7599730b7e:Remove support for v1 of the PCRE library

Signed-off-by: Ævar Arnfjörð Bjarmason

Remove support for using version 1 of the PCRE library.
Its use has been discouraged by upstream for a long time, and it's in a bugfix-only state.

Anyone who was relying on v1 in particular got a nudge to move to v2 in e6c531b (Makefile: make USE_LIBPCRE=YesPlease mean v2, 2018-03-11, Git v2.18.0-rc0 -- merge listed in batch #1) (Makefile: make USE_LIBPCRE=YesPlease mean v2, not v1, 2018-03-11), which was first released as part of v2.18.0.

With this the LIBPCRE2 test prerequisites is redundant to PCRE.
But I'm keeping it for self-documentation purposes, and to avoid conflict with other in-flight PCRE patches.

I'm also not changing all of our own "pcre2" names to "pcre", i.e.
the inverse of 6d4b574 ("grep: change internal pcre variable & function names to be pcre1", 2017-05-25, Git v2.14.0-rc0 -- merge listed in batch #5).
I don't see the point, and it makes the history/blame harder to read.
Maybe if there's ever a PCRE v3...


Still with Git 2.31 (Q1 2021), the support for invalid UTF-8 in PCRE2 has been updated.

See commit 95ca1f9, commit a4fea08 (24 Jan 2021) by Ævar Arnfjörð Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit 59ace28, 10 Feb 2021)

grep/pcre2: better support invalid UTF-8 haystacks

Signed-off-by: Ævar Arnfjörð Bjarmason

Improve the support for invalid UTF-8 haystacks given a non-ASCII needle when using the PCREv2 backend.

This is a more complete fix for a bug I started to fix in 870eea8 ("grep: do not enter PCRE2_UTF mode on fixed matching", 2019-07-26, Git v2.24.0-rc0 -- merge listed in batch #8), now that PCREv2 has the PCRE2_MATCH_INVALID_UTF mode we can make use of it.

This fixes the sort of case described in 8a59998 ("grep: stess test PCRE v2 on invalid UTF-8 data", 2019-07-26, Git v2.24.0-rc0 -- merge listed in batch #8), i.e.:

  • The subject string is non-ASCII (e.g. "ævar")
  • We're under a is_utf8_locale(), e.g. "en_US.UTF-8", not "C"
  • We are using --ignore-case, or we're a non-fixed pattern

If those conditions were satisfied and we matched found non-valid UTF-8 data PCREv2 might bark on it, in practice this only happened under the JIT backend (turned on by default on most platforms).

Ultimately this fixes a "regression" in b65abca ("grep: use PCRE v2 for optimized fixed-string search", 2019-07-01, Git v2.24.0-rc0 -- merge listed in batch #8), I'm putting that in scare-quotes because before then we wouldn't properly support these complex case-folding, locale etc.
cases either, it just broke in different ways.

There was a bug related to this the PCRE2_NO_START_OPTIMIZE flag fixed in PCREv2 10.36.
It can be worked around by setting the PCRE2_NO_START_OPTIMIZE flag.
Let's do that in those cases, and add tests for the bug.


With Git 2.32 (Q2 2021), there are now updates to memory allocation code around the use of pcre2 library (so V2).

See commit c176035, commit cbe81e6, commit 8d12851, commit b76bf27, commit 797c359, commit a39b400, commit 588e4fb, commit 47eebd2, commit 1cfc5a8, commit 0ddf8ce (18 Feb 2021) by Ævar Arnfjörð Bjarmason (avar).
(Merged by Junio C Hamano -- gitster -- in commit 24119d9, 22 Mar 2021)

grep/pcre2: move back to thread-only PCREv2 structures

Signed-off-by: Ævar Arnfjörð Bjarmason

Change the setup of the "pcre2_general_context" to happen per-thread in compile_pcre2_pattern() instead of in grep_init().

This change brings it in line with how the rest of the pcre2_* members in the grep_pat structure are set up.

As noted in the preceding commit the approach 513f2b0 ("grep: make PCRE2 aware of custom allocator", 2019-10-16, Git v2.24.0-rc1 -- merge listed in batch #11) took to allocate the pcre2_general_context seems to have been initially based on a misunderstanding of how PCREv2 memory allocation works.

The approach of creating a global context in grep_init() is just added complexity for almost zero gain.
On my system it's 24 bytes saved per-thread.
For comparison PCREv2 will then go on to allocate at least a kilobyte for its own thread-local state.

As noted in 6d423dd ("grep: don't redundantly compile throwaway patterns under threading", 2017-05-25, Git v2.14.0-rc0 -- merge listed in batch #9) the grep code is intentionally not trying to micro-optimize allocations by e.g. sharing some PCREv2 structures globally, while making others thread-local.

So let's remove this special case and make all of them thread-local again for simplicity.
With this change we could move the pcre2_{malloc,free} functions around to live closer to their current use.

See also the discussion in 94da919 ("grep: add support for PCRE v2", 2017-06-01, Git v2.14.0-rc0 -- merge listed in batch #9) about thread safety, and Johannes's comments to the effect that we should be doing what this patch is doing.


With Git 2.40 (Q1 2023), in an environment where dynamically generated code is prohibited to run (e.g. SELinux), failure to JIT pcre patterns is expected.
Fall back to interpreted execution in such a case.

See commit 50b6ad5 (31 Jan 2023) by Mathias Krause (mathiaskrause).
(Merged by Junio C Hamano -- gitster -- in commit 214242a, 15 Feb 2023)

grep: fall back to interpreter if JIT memory allocation fails

Cc: Carlo Marcelo Arenas Belón
Signed-off-by: Mathias Krause

Under Linux systems with SELinux's 'deny_execmem' or PaX's MPROTECT enabled, the allocation of PCRE2's JIT rwx memory may be prohibited, making pcre2_jit_compile() fail with PCRE2_ERROR_NOMEMORY (-48):

[user@fedora git]$ git grep -c PCRE2_JIT
grep.c:1

[user@fedora git]$ # Enable SELinux's W^X policy
[user@fedora git]$ sudo semanage boolean -m -1 deny_execmem

[user@fedora git]$ # JIT memory allocation fails, breaking 'git grep'
[user@fedora git]$ git grep -c PCRE2_JIT
fatal: Couldn't JIT the PCRE2 pattern 'PCRE2_JIT', got '-48'

Instead of failing hard in this case and making 'git grep'(man) unusable on such systems, simply fall back to interpreter mode, leading to a much better user experience.

As having a functional PCRE2 JIT compiler is a legitimate use case for performance reasons, we'll only do the fallback if the supposedly available JIT is found to be non-functional by attempting to JIT compile a very simple pattern.
If this fails, JIT is deemed to be non-functional and we do the interpreter fallback.
For all other cases, i.e.
the simple pattern can be compiled but the user provided cannot, we fail hard as we do now as the reason for the failure must be the pattern itself.
To aid users in helping themselves change the error message to include a hint about the '(*NO_JIT)' prefix.
Also clip the pattern at 64 characters to ensure the hint will be seen by the user and not internally truncated by the die() function.

VonC
  • 1,262,500
  • 529
  • 4,410
  • 5,250