python multiline regex capture

Question

I have the following string:

hello
abcd
pqrs
123
123
123

My objective is to capture everything starting hello and till the first occurrence of 123. So the expected output is as:

hello
abcd
pqrs
123

I used the following:

output=re.findall('hello.*123?',input_string,re.DOTALL)

But the output is as:

['hello\nabcd\npqrs\n123\n123\n123']

Is there a way to make this lookup non-greedy using ? for 123? Or is there any other way to achieve the expected output?

`re.search('[\w\n]+(?=\n123\n{1})', input_string).group(0)`? — Abdou, Jan 05 '17 at 19:09

score 1 · Accepted Answer · answered Jan 05 '17 at 19:19

1

Try using lookhead for this. You are looking for a group of characters followed by \n123\n:

import re

input_string = """hello
abcd
pqrs
123
123
123"""

output_string = re.search('[\w\n]+(?=\n123\n)', input_string).group(0)

print(output_string)

#hello
#abcd
#pqrs
#123

I hope this proves useful.

answered Jan 05 '17 at 19:19

Abdou

12,931
4
39
42

I asked the question just for reference.The string i am working on is NOT necessarily ending in digits, it can be alphanumeric.Is there some simpler way to achieve this ? – fsociety Jan 05 '17 at 19:23
Can you please provide me with more details as to what you mean by "___NOT necessarily ending in digits___"? You can replace `123` with anything of your choosing as long as it's where you are trying to cut your string. – Abdou Jan 05 '17 at 19:27

python multiline regex capture

1 Answers1