3

I have a table in Druid, something like

Timestamp || UserId || Action

And I need to get the latest Action for each UserId. In MySQL I would do something like

Select * from users u1 inner join (
    select UserId, max(Timestamp) as maxt from users group by UserId
) u2
on u1.UserId = u2.UserId and u1.Timestamp = u2.maxt

But Druid can't do joins and only very basic sub-selects.

I know the "right" answer is probably to denormalize the data at ingestion time, but unfortunately that's not an option as I don't "own" the ingestion part.

The only solution I have come up with so far is to retrieve all the results for both queries in Java code and do the join manually, but I will run into memory constraints when the dataset grows I would imagine.

I tried to look at materialized views, but that looks like it's still incubating and would require a hadoop cluster, so isn't really viable.

I tried to do something like

Select * from users u1 where concat(Timestamp, UserId) in (
    select concat(UserId, max(Timestamp)) from users group by UserId
)

But it didn't like that either.

Any suggestions?

Matt
  • 3,303
  • 5
  • 31
  • 53

1 Answers1

2

LATEST(expr)

Returns the latest value of expr, which must be numeric. If expr comes from a relation with a timestamp column (like a Druid datasource) then "latest" is the value last encountered with the maximum overall timestamp of all values being aggregated. If expr does not come from a relation with a timestamp, then it is simply the last value encountered.

https://druid.apache.org/docs/0.20.0/querying/sql.html

Omkar76
  • 1,317
  • 1
  • 8
  • 22
orient.lu
  • 21
  • 3