At my Crate.io Database I have a table with currently 50 mio rowsand a size of 16GB. If I try to get the amount of entries per day with the following statement all works fine (except the performance, but this should not be the problem at the moment):
SELECT
date_format('%Y-%m-%d', date_trunc('day', "days")) AS "Day",
count(*) AS "Count"
FROM "doc"."mytable"
WHERE
date_format('%Y-%m-%d', date_trunc('day', "days")) BETWEEN date_format('%Y-%m-%d', date_trunc('day', current_timestamp + -2592000000))
AND date_format('%Y-%m-%d', date_trunc('day', current_timestamp + -86400000))
GROUP BY date_format('%Y-%m-%d', date_trunc('day', "days"))
ORDER BY date_format('%Y-%m-%d', date_trunc('day', "days")) ASC limit 100;
But if I try to make a distinct at another column like this:
SELECT
date_format('%Y-%m-%d', date_trunc('day', "days")) AS "Day",
count(DISTINCT customerid) AS "Count"
FROM "doc"."mytable"
WHERE
date_format('%Y-%m-%d', date_trunc('day', "days")) BETWEEN date_format('%Y-%m-%d', date_trunc('day', current_timestamp + -2592000000))
AND date_format('%Y-%m-%d', date_trunc('day', current_timestamp + -86400000))
GROUP BY date_format('%Y-%m-%d', date_trunc('day', "days"))
ORDER BY date_format('%Y-%m-%d', date_trunc('day', "days")) ASC limit 100;
the statement will fail with
SQLActionException[CircuitBreakingException: [query] Data too large, data for [collect: 0] would be larger than limit of [1267571097/1.1gb]]
Has anybody an idea why a COUNT(DISTINCT col) has a problem with too much data, but a COUNT(*) not? And how can I solve this issue?