1

Given the table of items below and a list of predefined patterns (modern, rustic, contemporary, classic, vintage) how can I create another table which contains regex matches organized by source for each item (source_1, source_2 etc),

the struct for each match is in key value format, i.e. <pattern STRING , source ARRAY<STRING>> , and each row would contain an array of these structs, i.e. ARRAY <<pattern STRING , source ARRAY <STRING>>>

items table :

with items_for_sale AS (
    select 1 as item_id, 'modern chair' as source_1, ['contemporary chair', 'modernist chair'] as source_2,  
    union all 
    select 2 as item_id, 'classic lamp' as source_1, ['modern vintage lamp', 'blah'] as source_2,  
    union all 
    select 3 as item_id, 'rustic bed' as source_1, ['cottage bed', 'vintage country bed'] as source_2, 
) 
select* from items_for_sale 

enter image description here

List of predefined patterns to search for e.g. modern, rustic, contemporary, classic, vintage (the actual list has ~1000 items), the regex is expected to find if the pattern is contained in the string

Expected output table with regex matches by source for each item:

enter image description here

This is pretty straightforward to do with python or any other language by creating a key-value dictionary for each item_id, but is it possible to do it in BQ SQL

alex
  • 1,757
  • 4
  • 21
  • 32

1 Answers1

2

Consider below simple approach

with patterns as (
  select pattern
  from unnest(['modern', 'rustic', 'contemporary', 'classic', 'vintage']) pattern
)
select item_id, 
  array_agg(struct(pattern, source) order by pattern, source) regexp_matches_by_source
from (
  select item_id, source_1 as value, 'source_1' as source from items_for_sale union all
  select item_id, source_2, 'source_2' from items_for_sale t, t.source_2 as source_2
)
join patterns 
on regexp_contains(value, pattern)
group by item_id    

If applied to sample data in your question - output is

enter image description here

Mikhail Berlyant
  • 165,386
  • 8
  • 154
  • 230