I have a table that logs one row of data per user per day with a user_id column. I would like to select all days from n random users, but the table is incredibly large, so I'd like to avoid doing a self join.
Right now, I'm doing:
SELECT
a.user_id, b.col1, b.col2, b.col3
FROM
(
SELECT DISTINCT
user_id
FROM
pipeline.user_daily
ORDER BY
RANDOM() LIMIT 100) a
LEFT JOIN
pipeline.user_daily b
ON
a.user_id = b.user_id
Is there any way to do this with a window function or any other non self join?
On an unrelated note, the above inner table a doesn't actually work in vertica, so I use
SELECT
user_id
FROM
pipeline.user_daily
GROUP BY
user_id
ORDER BY
RANDOM() LIMIT 100
Any thoughts on why select distinct order by random doesn't work in Vertica?