I create a regular expression to find urls like /places/:state/:city/whatever
p = re.compile('^/places/(?P<state>[^/]+)/(?P<city>[^/]+).*$')
This works just fine:
import re
p = re.compile('^/places/(?P<state>[^/]+)/(?P<city>[^/]+).*$')
path = '/places/NY/NY/other/stuff'
match = p.match(path)
print match.groupdict()
Prints {'city': 'NY', 'state': 'NY'}
.
How can I process a logfile to replace /places/NY/NY/other/stuff
with the string "/places/:state/:city/other/stuff"
? I'd like to get a sense of how many urls are of the "cities-type" without caring that the places are (NY
, NY
) specifically.
The simple approach can fail:
import re
p = re.compile('^/places/(?P<state>[^/]+)/(?P<city>[^/]+).*$')
path = '/places/NY/NY/other/stuff'
match = p.match(path)
if match:
groupdict = match.groupdict()
for k, v in sorted(groupdict.items()):
path = path.replace(v, ':' + k, 1)
print path
Will print /places/:city/:state/other/stuff
, which is backwards!
Feels like there should be some way to use re.sub
but I can't see it.