Regex to non-greedily match across multiple lines up to a line that starts with a specific string

Question

I am going to answer this myself, but this was giving me fits all day and although it is explained elsewhere, I thought I'd post it with my solution.

I came across a situation where I needed to replace some text spanning multiple lines. It wasn't hard to find threads about how to match across multiple lines. My case was a bit more difficult in that I needed to wildcard match any character across multiple lines, until stopping at the first non-indented closing bracket.

For demonstration purposes, I made a sample file that has the features that made this hard for me:

starting file:

cat << EOF > test.txt
server {
    abcdefg blablablabla
    pizza
    #blablablabla
    blablablabla {
    zazazazazaza
    }
    turtles
    #}
    ninjas
    blablablabla

} #comments that might or might not be here

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
}

zabzazab

EOF

This was my desired output. Note that the bracket I am matching to is neither the first nor the last occurrence of the closing bracket. Its only distinguishing feature is that being the first } at the beginning of a line after the start of my match:

server {
    wxyz

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
} 

zabzazab

What I hoped would work. But slupring with 0777 strips out the markers for the beginning and end of a line, so it didn't work:

~#  perl -0777 -pe 's/(abcdefg(.*?)(^}.*$))/wxyz/gs' test.txt
server {
    abcdefg blablablabla
    pizza
    #blablablabla
    blablablabla {
    zazazazazaza
    }
    turtles
    #}
    ninjas
    blablablabla

} #comments that might or might not be here

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
}

zabzazab

Matching the line start/end while also slupring was sticking point:

~# perl -0777 -pe 's/(abcdefg(.*?)(}))/wxyz/gs' test.txt
server {
    wxyz
    turtles
    #}
    ninjas
    blablablabla

} #comments that might or might not be here

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
}

zabzazab

So is there a way I can get a regex to match between a string and the first instance of a { that appears at the beginning of a line? I'm open to using sed too, but I figured the non-greedy nature of my search would make perl a better choice.

You need to focus more on explaining what you are trying to do (mainly in words). I don't know and I expect I am not alone. It would be best to add a precise, unambiguous verbal statement of the problem at the beginning, but failing that, immediately before the line "desired output:" please explain how you wish to manipulate the input data that is immediately above. — Cary Swoveland, Mar 02 '20 at 03:51
You're right, I was probably still too "in" the problem to explain it well. I have now edited it, hopefully it is clearer now. — Stonecraft, Mar 02 '20 at 04:09
Better. I still don't understand the rule for matching `"zabzazab"`. Also, is the literal "server {" to be matched or could it be, say "don {", as long as the "d" is in the first column? — Cary Swoveland, Mar 02 '20 at 04:18
That's true, my example didn't capture all the things at play in my real situation, there were multiple occurrences of `server {` with variable content. The `zabzabzab` was there for testing content after the last `}`. — Stonecraft, Mar 02 '20 at 04:26

Polar Bear · Accepted Answer · 2020-03-02T05:26:28.463

4

Perhaps any of following command will do it

perl -0777 -pe 's/abcdefg.*?(\nserver.*?)/wxyz\n$1/s' test.txt
perl -0777 -pe 's/abcdefg.*?server/wxyz\n\nserver/s' text.txt
perl -0777 -pe 's/abcdefg.*?}.*?}.*?}.*?\n/wxyz\n/s' test.txt
perl -0777 -pe 's/abcdefg(.*?}){3}.*?\n/wxyz\n/s' test.txt
perl -0777 -pe 's/abcdefg.*?\n}.*?\n/wxyz\n/s' test.txt

Output

server {
    wxyz

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
}

zabzazab

edited Mar 02 '20 at 05:26

answered Mar 02 '20 at 04:54

Polar Bear

6,762
1
5
12

1

Awesome, thanks. A set of examples like this is exactly what I was hoping to get. I wouldn't be surprised if future me's search finds this exact thread after I've forgotten about it. – Stonecraft Mar 02 '20 at 08:50

Cary Swoveland · Answer 2 · 2020-03-02T05:12:47.490

As I understand the question, you wish to match the portion of the string

server {
    abcdefg blablablabla
    pizza
    #blablablabla
    blablablabla {
    zazazazazaza
    }
    turtles
    #}
    ninjas
    blablablabla

} #comments that might or might not be here

server {
... blablablabla
}
...

that begins "abcdefg" and ends at the end of the line, "} #comments that might or might not be here", provided "abcdefg" begins a line after indentation and that line is preceded by the line, "server {". You will then substitute another string for the matched text.

You can match the text to be replaced with the following regular expression:

/^server +\{\s+(abcdefg.+?\n\}.*?$)/sm

demo

The flag s allows .* to match newlines. The flag m instructs the parser to treat the anchors ^ and $ as the beginning and end of a line, respectively (presumably, as opposed to the beginning and end of the string).

We can write the regex in free-spacing mode to make it self-documenting.

/
^server +\{\s+    # match 'server {` followed by 1+
                  #  whitespace chars
(                 # begin capture group 1
  abcdefg         # match literal
  .+?             # match 1+ chars, lazily
  \n              # match a newline
  \}              # match '}'
  .*?             # match 1+ chars, lazily
  $               # match end of line
)                 # end capture group 1
/smx              # single-line, multiline and free-
                  # spacing regex definition modes

score 0 · Answer 3 · answered Mar 02 '20 at 03:03

0

It seems I need both the s and the m flags and the slurping:

~# perl -0777 -pe 's/(abcdefg(.+?)(\n}))/wxyz/sm' test.txt
server {
    wxyz #comments that might or might not be here

server {
    blablablabla
    blablablabla
    blablablabla
    blablablabla
}

I still don't quite get why I needed both the m modifier AND slurping though. So if someone has a better answer, I'll mark that one instead of my own.

answered Mar 02 '20 at 03:03

Stonecraft

860
1
12
30

What for you use **()** in your regular expression? You do not use captured **data**. – Polar Bear Mar 02 '20 at 04:58
1

It works without **m** as well `perl -0777 -pe 's/abcdefg.*?\n}/wxyz/s' test.txt`. – Polar Bear Mar 02 '20 at 05:39
I added the `()` when I was trying to use capture groups to get what I wanted, but I just kept them as they make regex easier to read and they didn't seem to change the results. – Stonecraft Mar 02 '20 at 05:49
Yeah your way is simpler and better. – Stonecraft Mar 02 '20 at 05:50
1

I would not say adding _()_ improves readability, but if you would understand what happens on program level then you would understand that _()_ are computationally is quite _expensive_. – Polar Bear Mar 02 '20 at 05:51
That is good to know, thanks. I do wish I understood the deeper level stuff affecting performance, but fortunately for me, we live in a time when I can flagrantly waste CPU cycles for personal comfort. – Stonecraft Mar 02 '20 at 08:46

Regex to non-greedily match across multiple lines up to a line that starts with a specific string

3 Answers3