2

I have a column like this:

String_to_Extract
A~S1_B~S2_C~S11
A~S1_B~S3_C~S12
C~S13_A~S11_B~S4

The part before the "~" should be the column name. The part after the "~" should be the row value. This is separated by a "_" . Therefore, the result should look like this:

String_to_Extract A B C
A~S1_B~S2_C~S11 S1 S2 S11
A~S1_B~S3_C~S12 S1 S3 S12
C~S13_A~S11_B~S4 S11 S4 S13

Here is my approach:

SELECT
String_to_Extract,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "A~")+2, ?) AS A,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "B~")+2, ?) AS B,
SUBSTRING(String_to_Extract, INSTR(Advertiser, "C~")+2, ?) AS C,
From Table

How do I get the part between the ~ and next _ for each column?

Would be glad about help!

Micha Hein
  • 33
  • 4

3 Answers3

1

One approach uses REGEXP_EXTRACT:

SELECT
    REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)A~([^_]+)") AS A,
    REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)B~([^_]+)") AS B,
    REGEXP_EXTRACT(String_to_Extract, r"(?:^|_)C~([^~]+)") AS C
FROM yourTable;
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Thank you very much. However, the order of the parameters could change or more could be added. Therefore, it would be useful to first search for the respective identifiers (A~, B~, C~, D~, .....) I guess. – Micha Hein Dec 04 '21 at 11:05
  • I get it. I have updated my answer. – Tim Biegeleisen Dec 04 '21 at 11:07
  • One more question on this. My delimiter now changed from ~ to |. So the string looks like: A|S1_B|S2_C|S11. The problem is that it will be identified as an OR operator. How can I handle this? – Micha Hein Dec 12 '21 at 20:08
  • 1
    @MichaHein You need to _escape_ the pipe, e.g. for the `A` value use: `(?:^|_)A\|([^_]+)` – Tim Biegeleisen Dec 13 '21 at 00:07
0

Consider below approach (BigQuery)

select * from (
  select String_to_Extract, col_val[offset(0)] as col, col_val[offset(1)] as val
  from your_table, unnest(split(String_to_Extract, '_')) kv,
  unnest([struct(split(kv, '~') as col_val)])
)
pivot (any_value(val) for col in ('A', 'B', 'C'))   

If applied to sample data in your question - output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230
0

You can also use this approach which orders the splitted item first and then picks the values:


select 
   split(ordered[safe_offset(0)], '~')[safe_offset(1)] as A,
   split(ordered[safe_offset(1)], '~')[safe_offset(1)] as B,
   split(ordered[safe_offset(2)], '~')[safe_offset(1)] as C
 from (
    select 
        array(select _ from unnest(split(Advertiser, '_') ) as _ order by 1) as ordered
    from dataset.table
)
khan
  • 7,005
  • 15
  • 48
  • 70