I have a table in Druid, something like
Timestamp || UserId || Action
And I need to get the latest Action for each UserId. In MySQL I would do something like
Select * from users u1 inner join (
select UserId, max(Timestamp) as maxt from users group by UserId
) u2
on u1.UserId = u2.UserId and u1.Timestamp = u2.maxt
But Druid can't do joins and only very basic sub-selects.
I know the "right" answer is probably to denormalize the data at ingestion time, but unfortunately that's not an option as I don't "own" the ingestion part.
The only solution I have come up with so far is to retrieve all the results for both queries in Java code and do the join manually, but I will run into memory constraints when the dataset grows I would imagine.
I tried to look at materialized views, but that looks like it's still incubating and would require a hadoop cluster, so isn't really viable.
I tried to do something like
Select * from users u1 where concat(Timestamp, UserId) in (
select concat(UserId, max(Timestamp)) from users group by UserId
)
But it didn't like that either.
Any suggestions?