I wrote a JS UDF to perform the desired computation:
create or replace function jaccard_sim(A array, B array)
returns string
language javascript
as $$
var union = new Set([...A, ...B]).size;
var intersection = new Set(
Array.from(new Set(A)).filter(x => new Set(B).has(x))
).size;
return intersection/union
$$;
With this, select jaccard_sim(a, b) from data
will work as expected.
I got the set operations for JS from https://exploringjs.com/impatient-js/ch_sets.html#union-a-b.
The UDF above solves the problem. As a bonus, this is how the native Snowflake approximate_similarity
/approximate_jaccard_index
works:
with data as (
select [1,2,3,4] a, [1,2,3,5] b
union all select [20,30,90], [20,40,90]
)
select approximate_similarity(mh), seq, array_agg(arr)
from (
select minhash(1023, value) mh, seq, any_value(a) arr
from data, table(flatten(a))
group by seq
union all
select minhash(1023, value) mh, seq, any_value(b) arr
from data, table(flatten(b))
group by seq
)
group by seq
