We use 'hits.product.v2ProductCategory' grouping by 'date' and 'fullVisitorId' to get the categories visited for each date*fullVisitorId combination. This includes duplicated categories.
Our goal is getting an aggregated variable over time range (grouping by 'fullVisitorId', it means just one row for each), with the number of unique categories (in one step, if possible).
Felipe Hoffa's solution of creating as many variables as categories (BigQuery: SPLIT() returns only one value) is not viable because of the high amount and dynamic nature of them.
The other option in how to do nested SQL select count, forces us to group by 'hits.product.v2ProductCategory' besides 'fullVisitorId' and 'date' in a subquery, giving an intermediate table with as many rows as categories for each date*fullVisitorId.
We've tried it using GROUP_CONCAT(UNIQUE) to get all the unique categories in a string and then LENGTH(REGEXP_REPLACE) to obtain the number of elements except the separator character.
Is there another more efficient approach?
Thanks in advance,
Lourdes Hernández