I have a table with the following columns:
column1 | column2 | timestamp | event_id |
---|---|---|---|
c1v1 | c2v1 | 2021-03-11 00:00:00 | 1 |
c1v2 | c2v2 | 2021-03-11 01:03:00 | 1 |
c1v3 | c2v3 | 2021-03-12 10:00:00 | 2 |
c1v4 | c2v4 | 2021-03-13 20:00:00 | 1 |
c1v5 | c2v5 | 2021-03-13 11:00:00 | 2 |
c1v6 | c2v6 | 2021-03-13 00:00:00 | 3 |
c1v7 | c2v7 | 2021-03-14 00:00:00 | 2 |
I have start_time = 2021-03-10 05:14:00
and end_time = 2021-03-15 15:12:19
I am doing an elastic search query on this data which
- partitions by 1 day from start_time to end_time
- counts the number of documents in each of the partitions (partitions with 0 documents also because of extended bounds)
- for each of the partitions, finds the no of unique values in the columns event_id
{
"query": {
"bool": {
"filter":
[
{
"term": {"column1": "some_value"}
},
{
"term": {"column2": "some_value"}
},
{
"range": {
"timestamp": {
"gte": "<start_time>",
"lt": "<end_time>"
}
}
}
]
}
},
"aggs": {
"timestamp": {
"date_histogram": {
"field": "timestamp",
"fixed_interval": "1d",
"extended_bounds": {
"min": "<start_time>",
"max": "<end_time>"
}
},
"aggs": {
"unique_values": {
"cardinality": {
"field": "event_id"
}
}
}
}
}
}
I need help to create an sql query which does the same.
Output:
timestamp | doc_count | unique_values |
---|---|---|
2021-03-10 | 0 | 0 |
2021-03-11 | 2 | 1 |
2021-03-12 | 1 | 1 |
2021-03-13 | 3 | 3 |
2021-03-14 | 1 | 1 |
2021-03-15 | 0 | 0 |
Update: I have come up with this query but the values I am getting are close to the ones in es but not exact. Also this does not return dates with 0 documents.
SELECT
date_floor,
count(date_floor) AS cnt_date_floor,
count(DISTINCT(event_id)) AS cnt_dst_event_id
FROM (
SELECT
event_id,
DATE(timestamp) AS date_floor
FROM
<table_name>
WHERE
date BETWEEN date'<start_date>' AND date'<end_date>' AND
timestamp >= timestamp'<start_time>' AND
timestamp < timestamp'<end_time>' AND
column1 IN ('some val') AND
column2 = some_val)
GROUP BY date_floor
where start_date and end_date are floor_dates of start_time and end_time