given these two tables (documents
and labels
) how can I find match counts for all the label's pattern
s from the labels
table, found in the document
field of documents
table (count exact matches, using regex is optional)
WITH documents AS (
SELECT 1 AS id, "foo bar, foo baz" AS document UNION ALL
SELECT 2, "foo bar bar qux" UNION ALL
SELECT 3, "etc blah blah"
),
labels as (
select 'FOO_LABEL' as label, 'foo' as pattern UNION ALL
select 'FOO_LABEL', 'qux' UNION ALL
select 'BAR_LABEL', 'bar' UNION ALL
select 'ETC_LABEL', 'etc'
)
the expected matches counts by document:
id, label, cnt
1, FOO_LABEL, 2
1, BAR_LABEL, 1
2, FOO_LABEL, 2
2, BAR_LABEL, 2
3, ETC_LABEL, 1
the difference from this question is that I need actual match counts
and unlike this question my patterns are coming from a separate table
there are ~100M documents, and ~1000 rows in labels table