You can use an id_dispatcher
function:
from itertools import count
def id_dispatcher():
return lambda c=count(1): next(c)
Then we can setup a defaultdict
ionary from the collections
package:
from collections import defaultdict
dc = defaultdict(id_dispatcher())
and then use a regex replacement (see link for the construction of a Twitter username regex):
import re
re_user = re.compile(r'(?<=^|(?<=[^a-zA-Z0-9-_\.]))@([A-Za-z]+[A-Za-z0-9]+)')
outp = re_user.sub(lambda x : 'USERNAME_TWITTER_%s'%dc[x.group(0)],tweet)
This produces:
>>> re_user.sub(lambda x : 'USERNAME_TWITTER_%s'%dc[x.group(0)],tweet)
"thank you guys, for coming my birthday USERNAME_TWITTER_1 USERNAME_TWITTER_2 USERNAME_TWITTER_3 , and USERNAME_TWITTER_1 don't forget your promises"