How to capture repeated blocks of multiline text?

Question

I need help pulling out a repeated block of config from a FortiGate firewall config file. It contains various sections in the format below.

Each of the vdom config sections ('config vdom' section) end with 2 'end's - I need to pull these blocks out as a first step before the next steps.

#header info

config vdom
edit root
next
edit test
next
edit test2
next
end

config global
...
...
...
end
end

config vdom
edit root
config system
...
end
config ...
...
...
......
...
end
end

config vdom
edit test
config system
...
end
config ...
...
...
......
...
end
end

config vdom
edit test2
config system
...
end
config ...
...
...
...
end
end

I'm using regex101.com to build the regex to use in a python script. Here's where I got so far.

(config vdom\nedit.+\nconfig[\s\S\r]*) - matches all text starting with the first vdom config, until the end of the file, includes the other vdom config too

(config vdom\nedit.+\nconfig[\s\S\r]*?) - matches only the first 3 lines until the first 'config'

(config vdom\nedit.+\nconfig[\s\S\r]*?end\n) - matches text until the first occurrence of 'end' - there are multiple 'end's throughout the config, but there are 2 of them at the end of each vdom config

(config vdom\nedit.+\nconfig[\s\S\r]*?end\nconfig) - matches text until the first occurrence of 'config', but instead if I use 'end' like below to match two of them, it fails

(config vdom\nedit.+\nconfig[\s\S\r]*?end\nend\n\n) - when trying to look for the occurrence of 2 ends followed by an empty line, it fails with 'catastrophic backtracking'

I don't know why it works when I use one end\n after the *? but fails as soon as I try adding the second one.

Any help will be greatly appreciated!

Can't reproduce - `(config vdom\nedit.+\nconfig[\s\S\r]*?end\nend\n\n)` works as expected. — Reto, Apr 05 '23 at 19:48
You're right! I tried it with just the sample config and it worked. I guess there's something in the full config that's breaking it. But even on the sample config, it doesn't match the third section because there's no `config vdom' after the last one ends. — Prithvi, Apr 05 '23 at 20:50

Cary Swoveland · Answer 1 · 2023-04-06T16:08:52.743

You can match

^config\s.*?(?=(?:\r?\n)+(?:config\s|\Z))

with multiline and single-line modes set.

Multiline mode causes ^ and $ to match the beginning and end of a line, respectively, rather than the beginning and end of the string.

Single-line (aka DOTALL) mode causes . to match all characters, not just all characters other than line terminators.

Demo

The regular expression has the following elements.

^              match the beginning of a line 
config\s       match 'config' followed by a whitespace
.*?            match zero or more characters, reluctantly
(?=            begin a positive lookahead
  (?:          begin a non-capture group
    \r?\n      match a newline, optionally preceded by a carriage
               return (to support Windows)  
  )+           end non-capture group, execute it one or more times  
  (?:          begin a non-capture group
    config\s   match 'config' followed by a whitespace
    |          or
    \Z         match the end of the string
  )            end non-capture group
)              end positive lookahead

.*?, being reluctant, matches as few characters as possible, stopping before the next 'config '. By contrast .*, being greedy, matches as many characters as possible, gobbling up 'config ''s until it gets to the last one.

score 0 · Answer 2 · answered Apr 06 '23 at 13:48

Another option without the dotall mode using a negative lookahead, matching the beginning of the pattern followed by all lines that do not start with 2 times end on a line:

^config vdom\r?\nedit\b.*(?:\r?\n(?!end\r?\nend$).*)*\r?\nend\r?\nend$

Explanation

^ Start of string
config vdom\r?\nedit\b.* Match config vdom a newline, edit and the rest of the line
(?: Non capture group to repeat as a whole part
- \r?\n Match a newline
- (?!end\r?\nend$).* Negative lookahead, assert not 2 times end on a line and if that is the case then match the whole line
)* Close the non capture group and repeat 2 times
\r?\nend\r?\nend Match 2 times a newline and end
$ End of string

Regex demo

How to capture repeated blocks of multiline text?

2 Answers2