0

Here's the line I'm trying to parse:

@abc def@gmail.com @ghi j@klm @nop.qrs @tuv

And here's the regex I've gotten so far:

@[A-Za-z]+[^0-9. ]+\b | @[A-Za-z]+[^0-9. ]

My goal is to get ['@abc', '@ghi', '@tuv'], but no matter what I do, I can't get 'j@klm' to not match. Any help is much appreciated.

basilbub
  • 21
  • 4

2 Answers2

1

Try using re.findall with the following regex pattern:

(?:(?<=^)|(?<=\s))@[A-Za-z]+(?=\s|$)

inp = "@abc def@gmail.com @ghi j@klm @nop.qrs @tuv"
matches = re.findall(r'(?:(?<=^)|(?<=\s))@[A-Za-z]+(?=\s|$)', inp)
print(matches)

This prints:

['@abc', '@ghi', '@tuv']

The regex calls for an explanation. The leading lookbehind (?:(?<=^)|(?<=\s)) asserts that what precedes the @ symbol is either a space or the start of the string. We can't use a word boundary here because @ is not a word character. We use a similar lookahead (?=\s|$) at the end of the pattern to rule out matching things like @nop.qrs. Again, a word boundary alone would not be sufficient.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
0

just add the line initiation match at the beginning:

^@[A-Za-z]+[^0-9. ]+\b | @[A-Za-z]+[^0-9. ]

it shoud work!

cccnrc
  • 1,195
  • 11
  • 27