Python regex to parse '@####' text in description field

Question

Here's the line I'm trying to parse:

@abc def@gmail.com @ghi j@klm @nop.qrs @tuv

And here's the regex I've gotten so far:

@[A-Za-z]+[^0-9. ]+\b | @[A-Za-z]+[^0-9. ]

My goal is to get ['@abc', '@ghi', '@tuv'], but no matter what I do, I can't get 'j@klm' to not match. Any help is much appreciated.

Tim Biegeleisen · Accepted Answer · 2019-08-03T04:37:17.707

Try using re.findall with the following regex pattern:

(?:(?<=^)|(?<=\s))@[A-Za-z]+(?=\s|$)

inp = "@abc def@gmail.com @ghi j@klm @nop.qrs @tuv"
matches = re.findall(r'(?:(?<=^)|(?<=\s))@[A-Za-z]+(?=\s|$)', inp)
print(matches)

This prints:

['@abc', '@ghi', '@tuv']

The regex calls for an explanation. The leading lookbehind (?:(?<=^)|(?<=\s)) asserts that what precedes the @ symbol is either a space or the start of the string. We can't use a word boundary here because @ is not a word character. We use a similar lookahead (?=\s|$) at the end of the pattern to rule out matching things like @nop.qrs. Again, a word boundary alone would not be sufficient.

This worked like a charm, and the explanation is a great bonus. Thanks! — basilbub, Aug 03 '19 at 05:01

score 0 · Answer 2 · answered Aug 03 '19 at 04:25

0

just add the line initiation match at the beginning:

^@[A-Za-z]+[^0-9. ]+\b | @[A-Za-z]+[^0-9. ]

it shoud work!

answered Aug 03 '19 at 04:25

cccnrc

1,195
11
27

Unfortunately when I run this, I get: ['`@abc` ', ' `@ghi`', ' `@nop`', ' `@tuv`'] – basilbub Aug 03 '19 at 05:00

Python regex to parse '@####' text in description field

2 Answers2