1

I need to parse urls in order to grab a value that comes after .com/ AND before the next / character. My data looks like this:

url
https://www.delish.com/food-news/news/jdhgkjdf/100-years-of-christmas
https://www.delish.com/food-news/news/100-years-of-christmas

The desired output is:

new_string
food-news
food-news

I have tried the following:

SPLIT(url, '/')[SAFE_OFFSET(ARRAY_LENGTH(SPLIT(url, '/')) - 4)] AS new string

But because the URLs are not consistent, sometimes it grabs food_news, sometimes it grabs www.delish.com, that's why offset is not working in this particular case.

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
Chique_Code
  • 1,422
  • 3
  • 23
  • 49

2 Answers2

2

Use below

regexp_extract(url, net.host(url) || r'/([^/]+)')
Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

SPLIT(url, '/')[SAFE_OFFSET(ARRAY_LENGTH(SPLIT(b.page_link, '.com/')) +1)] AS new_string

Chique_Code
  • 1,422
  • 3
  • 23
  • 49