0

I need to write a regex in python to extract mentions from Tweets.

My attempt:

regex=re.compile(r"(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9]+)")

It works fine for any mention like @mickey However, in mentions with underscores like @mickey_mouse, it only extracts @mickey.

How can I modify the regex for it to work in both cases?

Thank you

Mauro Gentile
  • 1,463
  • 6
  • 26
  • 37
  • Looks like you could use `\w` for *word character* which also contains underscore. Something like [`(?<![\w.-])@(\w+)`](https://regex101.com/r/VBJIdS/2). – bobble bubble May 13 '17 at 21:21

2 Answers2

4

Add an underscore to the last set like this:

(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9_]+)

Regex101 Demo

On a side note, Twitter Handle rules allow you to have usernames starting with numbers & underscores as well. So to extract twitter handles a regex could be as simple as: @\w{1,15} (allows characters, numbers and underscores and includes the 15 character limit). Will need some additional lookaheads/lookbehinds based on where the regex might be used.

degant
  • 4,861
  • 1
  • 17
  • 29
0

A shorter version, including the negative cases from @degant:

(?<=@)\w+
Alex
  • 2,784
  • 2
  • 32
  • 46