1

I have ci-pipelines and there are a lot of before_scripts sections. I would like to make a multiline regexp. I export all before script to my-ci-jobs.txt with python script.

pcregrep -M 'before_script.*\n.*' my-ci-jobs.txt 
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"
"before_script": [
    "yarn install"

This works fine, but sometimes, there are more lines in before script, so I would like to make regular that catch everything between before_script and first match of ],. But when I implement it, it will catch the longest match. This is my command (I will not past here the result, it is the whole file till the last ],):

pcregrep -M 'before_script.*(\n|.)*],' my-ci-jobs.txt

How can I make regexp to match first match? Is there a better way how to do a multiline regexp?

dorinand
  • 1,397
  • 1
  • 24
  • 49
  • Thank you for replay. This does not work. It returns nothing. – dorinand Jun 09 '20 at 09:09
  • 1
    Does [this](https://regex101.com/r/mmNTu7/4) help?I edited the regex. It'll help if you provide the expected result of the above input. Perhaps you can create a small sample of your file and paste. –  Jun 09 '20 at 09:09
  • Thats works, thank you. Could you explain the reqexp and Answer my question? – dorinand Jun 09 '20 at 09:13
  • 1
    I think you just need `pcregrep -M 'before_script[^]]*]' file`. If you need the first match only, add `| head -1` – Wiktor Stribiżew Jun 09 '20 at 09:15
  • I needed first match of `before_script` something `],` but basically I need all `before_scripts` and then check the diffs. Your first reqular works perfectly as I needed. – dorinand Jun 09 '20 at 09:23

1 Answers1

2

You almost never need (.|\n) in a regular expression, there are better means to match any chars including line break chars.

To match any zero or more chars but ] you may use [^]]* pattern:

pcregrep -M 'before_script[^]]*]' file

If you need the first match only, add | head -1:

pcregrep -M 'before_script[^]]*]' file | head -1

Pattern details

  • before_script - some literal text
  • [^]]* - a negated bracket expression that matches any chars but a ] char, 0 or more times, as many as possible (since * is a greedy quantifier) (it matches line break chars, too, because you pass an -M option to pcregrep)
  • ] - a literal ] char (no need to escape it because ] outside a character class is not special).
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563