0

I have the following string:

hello
abcd
pqrs
123
123
123

My objective is to capture everything starting hello and till the first occurrence of 123. So the expected output is as:

hello
abcd
pqrs
123

I used the following:

output=re.findall('hello.*123?',input_string,re.DOTALL)

But the output is as:

['hello\nabcd\npqrs\n123\n123\n123']

Is there a way to make this lookup non-greedy using ? for 123? Or is there any other way to achieve the expected output?

fsociety
  • 977
  • 3
  • 12
  • 23

1 Answers1

1

Try using lookhead for this. You are looking for a group of characters followed by \n123\n:

import re

input_string = """hello
abcd
pqrs
123
123
123"""

output_string = re.search('[\w\n]+(?=\n123\n)', input_string).group(0)

print(output_string)

#hello
#abcd
#pqrs
#123

I hope this proves useful.

Abdou
  • 12,931
  • 4
  • 39
  • 42
  • I asked the question just for reference.The string i am working on is NOT necessarily ending in digits, it can be alphanumeric.Is there some simpler way to achieve this ? – fsociety Jan 05 '17 at 19:23
  • Can you please provide me with more details as to what you mean by "___NOT necessarily ending in digits___"? You can replace `123` with anything of your choosing as long as it's where you are trying to cut your string. – Abdou Jan 05 '17 at 19:27