Grab the values from url that is in between specific characters BigQuery

Question

I need to parse urls in order to grab a value that comes after .com/ AND before the next / character. My data looks like this:

url
https://www.delish.com/food-news/news/jdhgkjdf/100-years-of-christmas
https://www.delish.com/food-news/news/100-years-of-christmas

The desired output is:

new_string
food-news
food-news

I have tried the following:

SPLIT(url, '/')[SAFE_OFFSET(ARRAY_LENGTH(SPLIT(url, '/')) - 4)] AS new string

But because the URLs are not consistent, sometimes it grabs food_news, sometimes it grabs www.delish.com, that's why offset is not working in this particular case.

score 2 · Accepted Answer · answered Jul 09 '21 at 13:19

2

Use below

regexp_extract(url, net.host(url) || r'/([^/]+)')

answered Jul 09 '21 at 13:19

Mikhail Berlyant

165,386
8
154
230

score 0 · Answer 2 · answered Jul 09 '21 at 13:14

0

SPLIT(url, '/')[SAFE_OFFSET(ARRAY_LENGTH(SPLIT(b.page_link, '.com/')) +1)] AS new_string

answered Jul 09 '21 at 13:14

Chique_Code

1,422
3
23
49

Grab the values from url that is in between specific characters BigQuery

2 Answers2