I applied on my dataframe the next command
df['date_article'] = df.pagePath.str.extract_regex(pattern='(?P<digit>/\d{4}/\d{2}/\d{2}/)')
And this created the column 'date_article'
pagePath | date_article |
---|---|
'/empresas/2021/10/22/tiendas-no-participan-buen' | {'digit': '/2021/10/22/'} |
'/finanzas-personales/2021/10/22/pueden-cobrar-c | {'digit': '/2021/10/22/'} |
Now I want to left only the date in 'date_article'.
Expected output
pagePath | date_article |
---|---|
'/empresas/2021/10/22/tiendas-no-participan-buen' | '/2021/10/22/' |
/finanzas-personales/2021/10/22/pueden-cobrar-c | '/2021/10/22/' |
I tried many things but nothing seems to work
Thank you in advance for help