parse url in pandas df column and grab value of specific index

Question

I have a pandas df with the column url. The data looks like this:

row               url
1      'https://www.delish.com/cooking/recipe-ideas/recipes/four-cheese'
2      'https://www.delish.com/holiday-recipes/thanksgiving/thanksgiving-cabbage/
3      'https://www.delish.com/kitchen-tools/cookware-reviews/advice/kitchen-tools-gadgets/'

I only need to grab the values of 2nd index, which is cooking or holiday-recipes, etc.
Desired output:

row               url
1               cooking
2               holiday-recipes
3               kitchen-tools

I wanted to parse urls into different columns and then drop the columns that I don't need. Here is the code:

df['protocol'],df['domain'],df['path']=zip(*df['url'].map(urlparse(df['url']).urlsplit))

The error message is: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all(). Is there a better way to solve the issue? How can I grab the specific index?

score 1 · Accepted Answer · answered Sep 10 '20 at 20:21

1

Is this what you're looking for?

df['url'] = df['url'].str.split('/').str[3]
print(df)

   row              url
0    1          cooking
1    2  holiday-recipes
2    3    kitchen-tools

answered Sep 10 '20 at 20:21

NYC Coder

7,424
2
11
24

Precisely! Thank you very much. I have accepted the answer. – Chique_Code Sep 11 '20 at 14:19

score 1 · Answer 2 · answered Sep 10 '20 at 20:59

1

Another way is to match the the alphas with character - immediately after com

df['url']=df['url'].str.extract('((?<=com\/)[a-z-]+)')



          url
0          cooking
1  holiday-recipes
2    kitchen-tools

answered Sep 10 '20 at 20:59

wwnde

26,119
6
18
32

parse url in pandas df column and grab value of specific index

2 Answers2