7

In clickhouse table I have multiple rows for one _id. What I want is to get only one row for each _id where column _status_set_at has its max value. That's what I am currently at:

SELECT _id, max(_status_set_at), count(_id)
FROM pikta.candidates_states
GROUP BY _id

output

Because I can't use max() function at WHERE clause, how to workaround this problem? count(_id) shows how many rows are there for each _id, if the query is correct, it should show 1. Also, as far as I concerned, there is no ON clause in Clickhouse database.

UPD: there is ON clause in Clickhouse

Graygood
  • 363
  • 1
  • 3
  • 18

5 Answers5

2

Solution - 1 :

SELECT Z._id,
       Z._status_set_at
  FROM 
(
SELECT _id, 
       _status_set_at, 
       max(_status_set_at) OVER ( PARTITION BY _id ORDER BY _status_set_at DESC ) AS rnk
FROM pikta.candidates_states
) Z
WHERE Z.rnk = 1;

Solution - 2 :

SELECT A._id,
       A._status_set_at  
  FROM pikta.candidates_states A
CROSS JOIN
       ( 
         SELECT _id, 
                MAX(_status_set_at) AS max_status_set_dt         
           FROM pikta.candidates_states
         GROUP BY _id
        ) B
WHERE A._id = B._id
  AND A._status_set_at = B.max_status_set_dt; 
Teja
  • 13,214
  • 36
  • 93
  • 155
2

If you want to you your max clause on where statement maybe this will work

SELECT * from (SELECT _id, max(_status_set_at) as [MaxDate], count(_id) as [RepeatCount]
    FROM pikta.candidates_states
    GROUP BY _id) t WHERE t.MaxDate = '@parameter'
Harun KARATAŞ
  • 116
  • 2
  • 11
2

Your query returns what you need - only one row for each _id where column _status_set_at has its max value. You do not need to change anything in your original query.

count(_id) shows how many rows for each _id in the original table, but not in a query result. Query result has only one row for each _id because you group by _id.

This query shows that in your query result there is only one row for each _id

SELECT _id, max_status_set_at, count(_id) FROM (
SELECT _id, max(_status_set_at) max_status_set_at
FROM pikta.candidates_states
GROUP BY _id) t
GROUP BY _id

If you need apply a condition on max(_status_set_at) you can use HAVING

Mikhail
  • 356
  • 2
  • 6
1

in clickhouse,this will work

select _id, argMax(<col you need>, _status_set_at) from pikta.candidates_states group by _id;

argMax

0

Replace <other columns> with list of other columns you need to select.

SELECT _id, _status_set_at, <other columns>
FROM pikta.candidates_states
WHERE (_id, _status_set_at) in (
    SELECT _id, max(_status_set_at)
    FROM pikta.candidates_states
    GROUP BY _id
)

Internal select returns pairs of _id with max _status_set_at for that _id. External select returns rows with additional colums from table, but only those of them, where _id and _status_set_at are in result of internal select, i.e. with max _status_set_at for each _id.

I posted my reply, because none of previous answers was good for initial question, as I understood it. Request in question should be able to return other colums, not only _id and _status_set_at, otherwise it's useless, you can use basic select for it, that already was mentiond in question. Mike's answer can't be modified that way. Solutions of Teja won't work in Clickhouse.

sdrenn00
  • 41
  • 6