Multiline python regex

Question

I have a file structured like this :

A: some text
B: more text
even more text
on several lines
A: and we start again
B: more text
more
multiline text

I'm trying to find the regex that will split my file like this :

>>>re.findall(regex,f.read())
[('some text','more text','even more text\non several lines'),
 ('and we start again','more text', 'more\nmultiline text')]

So far, I've ended up with the following :

>>>re.findall('A:(.*?)\nB:(.*?)\n(.*?)',f.read(),re.DOTALL)
[(' some text', ' more text', ''), (' and we start again', ' more text', '')]

The multiline text is not catched. I guess is because the lazy qualifier is really lazy and catch nothing, but I take it out, the regex gets really greedy :

>>>re.findall('A:(.*?)\nB:(.*?)\n(.*)',f.read(),re.DOTALL)
[(' some text',
' more text',
'even more text\non several lines\nA: and we start again\nB: more text\nmore\nmultiline text')]

Does any one has an idea ? Thanks !

Welcome to StackOverflow! This is an example of a really good question - complete specs, reproducible code, an accurate analysis of the problem - great! — Tim Pietzcker, Oct 09 '12 at 13:08

score 13 · Accepted Answer · answered Oct 09 '12 at 12:31

13

You could tell the regex to stop matching at the next line that starts with A: (or at the end of the string):

re.findall(r'A:(.*?)\nB:(.*?)\n(.*?)(?=^A:|\Z)', f.read(), re.DOTALL|re.MULTILINE)

answered Oct 09 '12 at 12:31

Tim Pietzcker

328,213
58
503
561

5

@user1731620 Don't forget to 'accept' the answer that helps you out. – kreativitea Oct 09 '12 at 14:10
@jmague Don't forget to 'accept' the answer that helps you out. – tommy.carstensen Nov 21 '17 at 16:19

Multiline python regex

1 Answers1

Linked