0

I'm using pcregrep to search for a multiline pattern, and I only want pcregrep to search through files with specific file extensions, i.e.

pcregrep -Mrl --include=*.sv -e '<my_multi-line_pattern>' /path/to/search

However, this throws an error: pcregrep: Error in 'include' regex at offset 0: nothing to repeat.

I've tried escaping & double-escaping the * to no avail. This syntax seems to work just fine for grep

grep -rl --include=*.sv '<my_single-line_pattern>' /path/to/search

Any help or hints greatly appreciated.

EDIT: Example multi-line pattern: '(?s)^\salu.*\.opa_i(' should match

alu u_alu(
     ...
   .opa_i(opa),
   .opb_i(opb),
     ...
)
mgoblue92
  • 57
  • 6

1 Answers1

3

Here's what pcregrep's docs say about --include (emphasis mine):

--include=pattern If any --include patterns are specified, the only files that are processed are those that match one of the patterns (and do not match an --exclude pattern). This option does not affect directories, but it applies to all files, whether listed on the command line, obtained from --file-list, or by scanning a directory. The pattern is a PCRE regular expression, and is matched against the final component of the file name, not the entire path. The -F, -w, and -x options do not apply to this pattern. The option may be given any number of times. If a file name matches both an --include and an --exclude pattern, it is excluded. There is no short form for this option.

And a * at the start of a regex is of course not valid.
Now you can make sense of the error message :)

Solution: --include='.*\.sv$' or just --include='\.sv$'

Lucas Trzesniewski
  • 50,214
  • 11
  • 107
  • 158
  • You're correct, but in practice it doesn't work? I tried using regexp as you explained and I got nothing. I'm not sure if you can see anything wrong with my regexp `pcregrep --include="\.js$" -Mr "import(\n|.)*from(\n|.)*('|\").*('|\")" . ` – J. Shaker Aug 30 '18 at 20:26
  • @J.Shaker your regex is certainly suboptimal because of excessive backtracking, but I can't tell what's wrong without context and without knowing the data. Try `import\s+from\s*['"].*?['"]` against a data sample on regex101.com, and then try it with pcregrep if you got the sample to work. – Lucas Trzesniewski Aug 30 '18 at 21:47
  • I ended up getting it to work, issue was that apparently the grep is only run in the local folder, and not the multiple subfolders under. I don't fully understand what's going on, because depending on the folder I specify, it does seem to recur over the subfolders in some scenarios – J. Shaker Aug 31 '18 at 00:06
  • @Lucas.Trzesniewski Well you didn't really answer anything. The question's about the --include option, which you don't address that but somewhy take the time to mention my RE is suboptimal, but it doesn't matter cause it's a POC. – J. Shaker Aug 31 '18 at 15:07
  • @J.Shaker well, I didn't mention `--include` because it looked fine the way you wrote it. As for your regex, you asked me "if you can see anything wrong with my regexp", so yeah, sure enough I told you what was wrong with your pattern, and my guess was that the pattern was causing your issue. I even took the time to rewrite your pattern into a better one. Next time if you're not ready to accept help, don't ask for it in the first place. – Lucas Trzesniewski Aug 31 '18 at 16:07
  • @Lucas.Trzesniewski we both know this question is about `--include` and not regexp so no need to play dumb. I'll gladly accept any help that's actually trying to solve the issue. Listen, I appreciate you taking the time but you see what I'm getting at right? – J. Shaker Aug 31 '18 at 17:48
  • @J.Shaker good luck trying to get help with that attitude. Have fun. – Lucas Trzesniewski Aug 31 '18 at 20:32