I am trying to remove the ID's from URLs so that they can be counted in reporting. With the ID's included they are counted as unique urls when they are not. i.e. there are 1000's instead of 10's
so I would like to take a URL like this
https://www.website.co.uk/page/home-page/id93847562
and cut off the ID so it is like this
https://www.website.co.uk/page/home-page/
the length of URL varies so I cannot cut using a certain amount of characters from the end or start or use a set amount of backslashes.
I am trying to change the URLS in a column in a pandas dataframe.
the closest to an answer here i could find was this: extract id from the URL using Python
but I haven't been able to translate it to my scenario
here's my code
df.loc[df['URL'].str.contains('id'),'URL' = 'URL'[:id]
I've tried to write ' if the URL string contains 'id' replace with the URL from start to id.
the error I get is:
File "<ipython-input-18-42dc8b2df1ff>", line 3
df.loc[df['URL'].str.contains('id'),'URL' = 'URL'[:id]
^
SyntaxError: invalid syntax
any ideas what I can do to make it work?
thank you in advance for any help and advice