apply regexp_substr to each row separately

Question

I have a table with a column "TAGS". Each row in this column has a bunch of dictionaries separated by commas. It looks like this:

Row 1:

{
   "id":"334",
   "name":"A"
},
{
   "id":"8999",
   "name":"B"
}

Row 2:

{
   "id":"334",
   "name":"C"
},
{
   "id":"8999",
   "name":"D"
}

I want to extract the "names" from the dictionaries and group them for each row. I want to create a new column "NAMES" that contains a list of all names. For example this:

NAMES
["A", "B"]
["C", "D"]

Select * from TAGS_TABLE

I was trying some thing like this:

SELECT regexp_substr(regexp_substr(regexp_substr(TAGS, '"name\\W+\\w+"'),'"\\w+',2),'\\w+') FROM TAGS_TABLE

SELECT listagg(regexp_substr(regexp_substr(regexp_substr(TAGS, '"name\\W+\\w+"'),'"\\w+',2),'\\w+'), ', ') FROM TAGS_TABLE

and although it works correctly in extracting the names, it only gives the first row's "name" values. so regexp_substr is being applied to just the first row, instead of all rows separately.

How can I ensure that the regexp function is applied to each row separately?

if you write [...] around your row data you can use json functions — nbk, Sep 26 '22 at 12:17

score 1 · Answer 1 · answered Sep 26 '22 at 14:41

1

Instead of REGEXP_SUBSTR, you can use REGEXP_SUBSTR_ALL:

https://docs.snowflake.com/en/sql-reference/functions/regexp_substr_all.html

Instead of returning the first result, it returns "an ARRAY that contains all substrings that match a regular expression within a string".

answered Sep 26 '22 at 14:41

Felipe Hoffa

54,922
16
151
325

oh, nice new function.. – Simeon Pilgrim Sep 27 '22 at 03:31

apply regexp_substr to each row separately

1 Answers1

Linked