0

I am trying to use back reference to match all occurrences of an imported class being instantiated using ripgrep with the --pcre2 option enabled.

First I am looking to see if a class is being imported and then back referencing that to look up where it is instantiated.

  • First attempt: Matches the first occurrence of new ExifInterface(str) My regex is: (import.+(ExifInterface)).+(new\s\2\(.+\))

  • Second attempt: Matches the last occurrence of new ExifInterface(str). My regex is (import.+(ExifInterface)).+(?:.+?(new\s\2\(.+\)))

My ripgrep command is rg --pcre2 --multiline-dotall -U "(import.+(ExifInterface)).+(new\s\2\(.+?\))" -r '$3' -o

Question. How can i match all the occrrences of new ExifInterface(str)

Bonus question: In some cases, i am getting a PCRE2: error matching: match limit exceeded stderr from rg, but cant figure out why. The document length is only 161 lines.

Link to regex101

Consider the following data sample:

import android.graphics.Point;
import android.media.ExifInterface;
import android.view.WindowManager;
import java.io.IOException;

public class MediaUtils {
    /* renamed from: a */
    public static float m13571a(String str) {
        if (str == null || str.isEmpty()) {
            throw new IllegalArgumentException("getRotationDegreeForImage requires a valid source uri!");
        }
        try {
            int attributeInt = new ExifInterface(str).getAttributeInt("Orientation", 1);
            if (attributeInt == 3) {
                return 180.0f;
new ExifInterface(str).getAttributeInt("Orientation", 1);
            }
            if (attributeInt == 6) {
                return 90.0f;
            }
securisec
  • 3,435
  • 6
  • 36
  • 63
  • What language are you using ? Are you using a utility like grep or something ? –  Jul 08 '19 at 21:23
  • OP mentions `ripgrep` specifically in the first paragraph. – Sean Bright Jul 08 '19 at 21:23
  • Oh, I thought he was using a language. Was going to tell you how to do it using the `\G` construct, but guess not now. –  Jul 08 '19 at 21:24
  • If that is something supported by PCRE, then it would apply here as well. – Sean Bright Jul 08 '19 at 21:25
  • It is supported, but the usage is in a repetative match, not grep. Grep starts over each time. –  Jul 08 '19 at 21:26
  • Btw you shouldn't use `.+`, change it to `.+?`, and add some word boundary's, and you don't need the backreferenced `\2` if you already know it to be `ExifInterface` I mean, why use it.. –  Jul 08 '19 at 21:30
  • Good point, but that is not the use case here. I am trying to search through code, so it is possible that there is a variable named `ExifInterface` that might be matched. Hence the back reference to the import statement, because now i have more assurances that it is a positive match. That should hopefully expain the `.+` because of line breaks between code blocks. – securisec Jul 08 '19 at 21:41

2 Answers2

0

Strictly PCRE regex that finds successive matches after an initial
specific match is this. It uses the \G construct that starts the
next search where the last match position left off.

(?:import.+\bExifInterface\b|(?!^)\G)[\S\s]+?\K\bnew\s+ExifInterface\s*\([\S\s]+?\)

https://regex101.com/r/e6L5rV/1

Don't use any flags other than //g the global flag.

Expanded:

 (?:
      import .+ \b ExifInterface \b 
   |  
      (?! ^ )
      \G 
 )
 [\S\s]+? 
 \K 
 \b new \s+ ExifInterface \s* \( [\S\s]+? \)
  • this is a beautiful regex. but this tiny issue with this is that when using `ripgrep`, it is will only showing the first match with this. – securisec Jul 08 '19 at 21:54
  • 1
    Well, that's what I thought. But, maybe the docs have an option to do _global_ matches, which is what you need. It is PCRE and is fully functional regex. Good luck ! –  Jul 08 '19 at 21:56
0

An alternative: You can obtain what you want using two grep commands (the first returns filenames of each file that contains import.*ExifInterface, the second finds where are the instanciations).

grep -no 'new ExifInterface(' $(grep -lr 'import.*ExifInterface' *) 

It's possible to do the same with ripgrep:

rg -noF 'new ExifInterface(' $(rg -l 'import.*ExifInterface')
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • My initial thought was to use grep, but when going through 100+ files, ripgrep performance is unbeatable – securisec Jul 08 '19 at 23:22
  • @securisec: for 100+ files, you don't have to worry about performance. Also, don't be too naive with benchmarks. – Casimir et Hippolyte Jul 08 '19 at 23:25
  • You are right, but my code that is leveraging this is already done around ripgrep, and really not feasible switching to using grep over ripgrep. Your answer will be very helpful to the next person that is looking for this type of solution, but in my case, I am a little stuck with my requirements. – securisec Jul 08 '19 at 23:31
  • 1
    This will work for ripgrep in exactly the same way. Just replace `grep` with `rg`, and drop the `r` flag in the second invocation. – BurntSushi5 Jul 09 '19 at 11:09
  • @BurntSushi5: indeed, also, the opening parenthesis have to be escaped. – Casimir et Hippolyte Jul 09 '19 at 12:23