Druid query to get "latest" value from third column

Question

I have a table in Druid, something like

Timestamp || UserId || Action

And I need to get the latest Action for each UserId. In MySQL I would do something like

Select * from users u1 inner join (
    select UserId, max(Timestamp) as maxt from users group by UserId
) u2
on u1.UserId = u2.UserId and u1.Timestamp = u2.maxt

But Druid can't do joins and only very basic sub-selects.

I know the "right" answer is probably to denormalize the data at ingestion time, but unfortunately that's not an option as I don't "own" the ingestion part.

The only solution I have come up with so far is to retrieve all the results for both queries in Java code and do the join manually, but I will run into memory constraints when the dataset grows I would imagine.

I tried to look at materialized views, but that looks like it's still incubating and would require a hadoop cluster, so isn't really viable.

I tried to do something like

Select * from users u1 where concat(Timestamp, UserId) in (
    select concat(UserId, max(Timestamp)) from users group by UserId
)

But it didn't like that either.

Any suggestions?

score 2 · Answer 1 · edited Jan 08 '21 at 12:42

LATEST(expr)

Returns the latest value of expr, which must be numeric. If expr comes from a relation with a timestamp column (like a Druid datasource) then "latest" is the value last encountered with the maximum overall timestamp of all values being aggregated. If expr does not come from a relation with a timestamp, then it is simply the last value encountered.

https://druid.apache.org/docs/0.20.0/querying/sql.html

Druid query to get "latest" value from third column

1 Answers1