50

Could you provide a regex that match Twitter usernames?

Extra bonus if a Python example is provided.

icktoofay
  • 126,289
  • 21
  • 250
  • 231
Juanjo Conti
  • 28,823
  • 42
  • 111
  • 133

11 Answers11

80
(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9-_]+)

I've used this as it disregards emails.

Here is a sample tweet:

@Hello how are @you doing @my_friend, email @000 me @ whats.up@example.com @shahmirj

Matches:

  • @Hello
  • @you
  • @my_friend
  • @shahmirj

It will also work for hashtags, I use the same expression with the @ changed to #.

DaveyDaveDave
  • 9,821
  • 11
  • 64
  • 77
Angel.King.47
  • 7,922
  • 14
  • 60
  • 85
  • 9
    Very good! Only one correction: hastags and screenNames can have both underscores. I'd add it so in this way the resulting regex is: (?<=^|(?<=[^a-zA-Z0-9-_\.]))#([A-Za-z]+[A-Za-z0-9-_]+) – backslash17 Jun 22 '12 at 02:31
  • 11
    Well also, the underscore can be at the beginning of the username: (?<=^|(?<=[^a-zA-Z0-9-\.]))#([A-Za-z_]+[A-Za-z0-9_]+) – NZal Jul 09 '13 at 08:04
  • 1
    Applying the answer to `'RT @daddy_san: RIGHT IN THE FEELS BRUH` gives only `@daddy` as the answer. – fixxxer Jun 05 '15 at 10:34
  • 2
    try `(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9_]+)` – Angel.King.47 Jun 05 '15 at 13:01
  • 1
    @000 should be catched either as well as screen names with underscore (as mentioned by @backslash17 and @fixxxer). And it will not catch hashtags just by simply raplacing @ with #, since hashtags can contains unicode as well. So the expression for mentions should be `(?<=^|(?<=[^a-zA-Z0-9-\.]))@([A-Za-z0-9_]+)` – rokh Aug 16 '15 at 08:11
  • Could someone suggest a modification to this pattern so that username with . (dot) is also matched but not detecting .(dot) at the end. – skepticNeophyte Jan 16 '19 at 19:25
  • Must limit characters; user names are from 1 to 15 chars; also the user name can start with underscore, checkout twitter username rules [here](https://help.twitter.com/en/managing-your-account/twitter-username-rules) – Rodrigo Laguna Apr 21 '21 at 11:47
  • the expression in answer doesn't catch usernames start with and contain underscore. @rokh's version accounts such instances. – Naveen Reddy Marthala Sep 24 '21 at 13:34
21

If you're talking about the @username thing they use on twitter, then you can use this:

import re
twitter_username_re = re.compile(r'@([A-Za-z0-9_]+)')

To make every instance an HTML link, you could do something like this:

my_html_str = twitter_username_re.sub(lambda m: '<a href="http://twitter.com/%s">%s</a>' % (m.group(1), m.group(0)), my_tweet)
icktoofay
  • 126,289
  • 21
  • 250
  • 231
19

The regex I use, and that have been tested in multiple contexts :

/(^|[^@\w])@(\w{1,15})\b/

This is the cleanest way I've found to test and replace Twitter username in strings.

#!/usr/bin/python

import re

text = "@RayFranco is answering to @jjconti, this is a real '@username83' but this is an@email.com, and this is a @probablyfaketwitterusername";

ftext = re.sub( r'(^|[^@\w])@(\w{1,15})\b', '\\1<a href="http://twitter.com/\\2">\\2</a>', text )

print ftext;

This will return me as expected :

<a href="http://twitter.com/RayFranco">RayFranco</a> is answering to <a href="http://twitter.com/jjconti">jjconti</a>, this is a real '<a href="http://twitter.com/username83">username83</a>' but this is an@email.com, and this is a @probablyfaketwitterusername

Based on Twitter specs :

Your username cannot be longer than 15 characters. Your real name can be longer (20 characters), but usernames are kept shorter for the sake of ease. A username can only contain alphanumeric characters (letters A-Z, numbers 0-9) with the exception of underscores, as noted above. Check to make sure your desired username doesn't contain any symbols, dashes, or spaces.

rayfranco
  • 3,630
  • 3
  • 26
  • 38
  • 2
    The cleanest. Nice posting of spec. – scharfmn Feb 25 '15 at 10:13
  • 2
    thanks, this is great! ...except it incorrectly matches usernames inside medium URLs, e.g. https://medium.com/@p5d12000/xyz. here's a modified version that fixes that: `(^|[^\w@/\!?=&])@(\w{1,15})\b`. (twitter itself is still better - it correctly auto-links just the @-mention in `/@abc`, and the full URL in `https://medium.com/@abc` - but oh well.) – ryan Nov 22 '17 at 16:33
13

Twitter recently released to open source in various languages including Java, Ruby (gem) and Javascript implementations of the code they use for finding user names, hash tags, lists and urls.

It is very regular expression oriented.

Evan
  • 18,183
  • 8
  • 41
  • 48
2

This is a method I have used in a project that takes the text attribute of a tweet object and returns the text with both the hashtags and user_mentions linked to their appropriate pages on twitter, complying with the most recent twitter display guidelines

def link_tweet(tweet):
"""
This method takes the text attribute from a tweet object and returns it with
user_mentions and hashtags linked
"""
tweet = re.sub(r'(\A|\s)@(\w+)', r'\1@<a href="http://www.twitter.com/\2">\2</a>', str(tweet))
return re.sub(r'(\A|\s)#(\w+)', r'\1#<a href="http://search.twitter.com/search?q=%23\2">\2</a>', str(tweet))

Once you call this method you can pass in the param my_tweet[x].text. Hope this is helpful.

Chris Clouten
  • 1,075
  • 3
  • 11
  • 24
2

The only characters accepted in the form are A-Z, 0-9, and underscore. Usernames are not case-sensitive, though, so you could use r'@(?i)[a-z0-9_]+' to match everything correctly and also discern between users.

wersimmon
  • 2,809
  • 3
  • 22
  • 35
  • 1
    It doesn't make much of a difference that they are not case-sensitive. `(?i)` refers to your pattern, not the value you capture. It's still up to the program to deal with ABC and Abc as the same value. – Kobi Feb 21 '10 at 05:32
1

This regex seems to solve Twitter usernames:

^@[A-Za-z0-9_]{1,15}$

Max 15 characters, allows underscores directly after the @, (which Twitter does), and allows all underscores (which, after a quick search, I found that Twitter apparently also does). Excludes email addresses.

RubyNoob
  • 517
  • 1
  • 7
  • 17
1

Shorter, /@([\w]+)/ works fine.

casraf
  • 21,085
  • 9
  • 56
  • 91
  • you're missing '_' and characters with accents on that one. add the equivalent of \p{L} in Python and '_' – Gubatron May 17 '12 at 18:07
  • Are they normally included in usernames on Twitter? I don't think he needs to be watching for them. Of course, it would add flexibility I guess – casraf May 28 '12 at 20:48
1

I have used the existing answers and modified it for my use case. (username must be longer then 4 characters)

^[A-z0-9_]{5,15}$

Rules:

  • Your username must be longer than 4 characters.
  • Your username must be shorter than 15 characters.
  • Your username can only contain letters, numbers and '_'.

Source: https://help.twitter.com/en/managing-your-account/twitter-username-rules

tokyodrift
  • 51
  • 9
0

In case you need to match all the handle, @handle and twitter.com/handle formats, this is a variation:

import re

match = re.search(r'^(?:.*twitter\.com/|@?)(\w{1,15})(?:$|/.*$)', text)
handle = match.group(1)

Explanation, examples and working regex here: https://regex101.com/r/7KbhqA/3

Matched

myhandle
@myhandle
@my_handle_2
twitter.com/myhandle
https://twitter.com/myhandle
https://twitter.com/myhandle/randomstuff

Not matched

mysuperhandleistoolong
@mysuperhandleistoolong
https://twitter.com/mysuperhandleistoolong
Ludo
  • 11
  • 3
0

You can use the following regex: ^@[A-Za-z0-9_]{1,15}$

In python:

import re    
pattern = re.compile('^@[A-Za-z0-9_]{1,15}$')
pattern.match('@Your_handle')

This will check if the string exactly matches the regex.

In a 'practical' setting, you could use it as follows:

pattern = re.compile('^@[A-Za-z0-9_]{1,15}$')
if pattern.match('@Your_handle'):
    print('Match')
else:
    print('No Match')
ATH
  • 666
  • 6
  • 13