Here is what I am trying to perform with Presto/AWS Athena. I'd be glad to get your remarks/explanations on the following.
I have a table called activity_table
| member_id | arrays_of_distinct_days_of_activity |
where
- member_id is VARCHAR and
- arrays_of_distinct_days_of_activity is an ARRAY of DATE(s)
This table includes some 150 million distinct member_id, and each member_id has it's own unique row.
I am trying to perform:
SELECT member_id,arrays_of_distinct_days_of_activity,dt
FROM activity_table
CROSS JOIN UNNEST(sequence(date '2019-07-01',date '2019-07-15',interval '1' day)) AS T(dt)
but it results in a very long time.
Two questions:
- Is it true that the very long calculation time results from Presto/Athena's checking for all combinations of the couple (member_id,arrays_of_distinct_days_of_activity) in order to CROSS JOIN with the UNNEST() array?
- If yes, and since each member_id has a row of its own, is there a way to specify to Presto/Athena that the CROSS JOIN should not be performed according to the combinations of (member_id,arrays_of_distinct_days_of_activity), but only by with the column member_id?
Thanks in advance for your answers!