2

I'm trying to use three projections in same query like below in a Druid environment:

select
  __time,
  count(distinct col1),
  count(distinct case when (condition1 and condition2 then (concat(col2,TIME_FORMAT(__time))) else 0 end )
from table
where condition3
GROUP BY __time

But instead I get an error saying - Unknown exception / Cannot build plan for query

It seems to work perfectly fine when I put just one count(distinct) in the query.

How can this be resolved?

markusk
  • 6,477
  • 34
  • 39
N S
  • 41
  • 1
  • 5

2 Answers2

1

As a workaround, you can do multiple subqueries and join them. Something like:

SELECT x.__time, x.delete_cnt, y.added_cnt 
FROM
(
SELECT FLOOR(__time to HOUR) __time, count(distinct deleted) delete_cnt
FROM wikipedia
GROUP BY 1
)x
JOIN
(
SELECT FLOOR(__time to HOUR) __time, count( distinct added) added_cnt
FROM wikipedia
GROUP BY 1
)y ON x.__time = y.__time
0

As the Druid documentation points out:

COUNT(DISTINCT expr) Counts distinct values of expr, which can be string, numeric, or hyperUnique. By default this is approximate, using a variant of HyperLogLog. To get exact counts set "useApproximateCountDistinct" to "false". If you do this, expr must be string or numeric, since exact counts are not possible using hyperUnique columns. See also APPROX_COUNT_DISTINCT(expr). In exact mode, only one distinct count per query is permitted.

So this is a Druid limitation: you either need to disable exact mode, or else limit yourself to one distinct count per query.

On a side note, other databases typically do not have this limitation. Apache Druid is designed for high performance real-time analytics, and as a result, its implementation of SQL has some restrictions. Internally, Druid uses a JSON-based query language. The SQL interface is powered by a parser and planner based on Apache Calcitea, which translates SQL into native Druid queries.

markusk
  • 6,477
  • 34
  • 39
  • The issue is I'm unable to use a non-distinct column like '__time' in this case with the other two expressions. The same problem persists with APPROX_COUNT_DISTINCT(expr) – N S Sep 19 '21 at 08:17
  • Have you enabled exact mode by setting `useApproximateCountDistinct` to `false`? If so, could you try disabling exact mode by setting it to `true` (or not setting it, since `true` is the default)? – markusk Sep 19 '21 at 08:19
  • 1
    Oh....I think I got what I was looking for by replacing both of them with APPROX_COUNT_DISTINCT(expr). Thanks! – N S Sep 19 '21 at 08:20