Extract content between two multi-line delimiters and check for empty value

Question

Lets say I have an input file like this:

#Backup TOC
boot.tar.gz    /boot/

#Filesystems
/boot               /dev/mapper/VolGroup-lv_root xfs

#Devices
/dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0-part1 PHY /dev/disk/by-path/pci-0000:03:00.0-scsi-0:0:0:0

#UnhandledFS
/var/
/var/log
/var/log/audit
/var/tmp

I want to extract content between every #header (and the last #UnhandledFS can be ignored), once extracted I have to check whether there is any entry available or not.

Below code I use to extract content between two #header. But it is however not repeating

lines = open("./input").readlines()
re.compile('#\w+(.*?)#\w+', re.DOTALL | re.M).findall(''.join(lines))

score 0 · Answer 1 · answered Dec 10 '18 at 00:30

The problem with your regex is that it consumes the "end" #header which causes it to skip #Filesystems and mess up your match.

What you need is called "lookahead" - it is a way to match a pattern without consuming it.

Here is a regex that will work for you:

re.compile(r'#[^\n]*\n([^#]*)(?=#)', re.DOTALL | re.M).findall(''.join(lines))

It also fixes the problem where a header with a space gets included in the match, like the first header in your example: the word TOC will be part of the match.

But, if you want minimum fixes to your regex, this will work too (except the TOC part):

re.compile('#\w+(.*?)(?=#\w+)', re.DOTALL | re.M).findall(''.join(lines))

Extract content between two multi-line delimiters and check for empty value

1 Answers1