How can I find twitter profile links with a regex?

Question

I want to parse html documents for links to twitter profiles using a regex and preg_match_all() in PHP. The twitter links are in this form:

http(s)://twitter.com/#!/twitter_name

I only want to grab links that are purely to the profile page ( eg. nothing after the twitter_name ).

I would like to handle both http and https ( because this is common in these links ).

I would also like to handle //www.twitter.com and //twitter.com ( also common ).

How should I structure my regex?

Mike Christensen · Answer 1 · 2011-12-15T16:08:41.143

2

How about something like:

(https?:)*\/\/(www.)*twitter.com\/#!/([A-Za-z0-9_]*)

I'm not sure what all characters are valid in a Twitter handle, but I'm assuming 0-9, letters and underscores.

Probably best to run it in case-insensitive mode and get rid of the A-Z as well.

edited Dec 15 '11 at 16:08

answered Dec 12 '11 at 22:40

Mike Christensen

I'm pretty sure that `[(http:|https:)]*` doesn't match what you think it should. It matches `hhhhhhh` or `))::::hpph:|||` for example. – Toto Dec 15 '11 at 15:14
why kleene star? that would overmatch! – clyfe Dec 17 '11 at 07:17

score 2 · Answer 2 · answered Dec 12 '11 at 22:44

2

Most general regex (that stops at "/" or space):

(https?:)?\/\/(www\.)?twitter.com\/(#!\/)?([^\/ ].)+

answered Dec 12 '11 at 22:44

clyfe

How would I modify that to also stop at a double quote ( " )? – T. Brian Jones Dec 12 '11 at 22:46
(https?:)?\/\/(www\.)?twitter.com\/(#!\/)?([^\/" ].)+ – clyfe Dec 12 '11 at 22:47

score 1 · Answer 3 · answered Dec 12 '11 at 22:41

1

Try

preg_match_all('|https?://(?:www\.)?twitter.com/#!/[a-z0-9_]+|im', $text, $matched)

Don't know exacly what characters can be inside twitter username so I assumed [a-z0-9_]+. $matched[1] should be username.

answered Dec 12 '11 at 22:41

piotrekkr

score 1 · Answer 4 · edited May 23 '17 at 10:34

1

Try the following:

preg_match_all('~https?://(?:www\.)?twitter.com/#!/([a-z0-9_]+)~im', $html, $matches);

$matches[1] contains the matching user names.

EDIT: For more information on what characters can appear in the user name, see this answer and for more general info see this Twitter Engineering page.

edited May 23 '17 at 10:34

Community

answered Dec 12 '11 at 22:42

cmbuckley

4 Answers4