0

I'm wanting to use Regex to get a specific file (e.g. package-lock.json) out of a git diff. The reason for this approach is because I'm getting a whole git diff via the Github API (Using Octocat js), therefore I can't just run the git diff on that specific file. (As far as I'm aware). Obviously the diff on a file like package-lock.json is very large so there's a lot of content). What I've noticed is that when I try to use a regular expression to get this content out it fails due to catastrophic backtracking.

Essentially the file structure looks like this

diff --git a/package-lock.json b/package-lock.json
lots of content

diff --git a/next-file b/next-file

Therefore my idea was to get everything between the two diff --git strings.

I figured I could just use this /(?<=diff --git )(.+?)(?=diff)/gs This works fine if the lookahead is not too far ahead, but after a long way through the file this stops working due to catastrophic backtracking.

I get why this is happening but just don't get how to get around it. Perhaps I should be sorting this some other way and just using Regex for more specific details?

Any help would be appreciated.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
Ben Rauzi
  • 564
  • 4
  • 13
  • 1
    Taking @Andy's lead, this is a good application of Perl's and Ruby's rather obscure [flip-flop operator](https://en.wikipedia.org/wiki/Flip-flop_(programming)). In Ruby this would be written `s1 = "diff --git a/package-lock.json b/package-lock.json"; s2 = "diff --git a/next-file b/next-file"; arr.select { |line| true if line == s1 .. line == s2 }[1..-2]`, where `arr` is an array of strings. `[1..-2]` removes the first and last lines of the array that is returned, which are `s1` and `s2` respectively. – Cary Swoveland Apr 27 '22 at 02:30

1 Answers1

2

You're working with lines of data, and regexes don't work well like that, as you've found out. Use a tool like awk that can find ranges of lines.

Give this file foo.txt:

Here is stuff I don't care about
diff --git a/package-lock.json b/package-lock.json
lots of content

diff --git a/next-file b/next-file
Don't care about this either.

use awk to specify a range of lines you want to print:

$ awk '/^diff --git a\/package-lock/,/^diff --git a\/next-file/' foo.txt
diff --git a/package-lock.json b/package-lock.json
lots of content

diff --git a/next-file b/next-file
Andy Lester
  • 91,102
  • 13
  • 100
  • 152