2

I am executing one influxSQL query which is returning result in more than 1 minutes. Query :

select SUM(call_duration) as total_duration,Count(Distinct(recipient_id)) as total_recipients from xyz where target_id = '1';

Separate queries like

select Count(Distinct(recipient_id)) as total_recipients from xyz where target_id = '1';

select SUM(call_duration) as total_duration from xyz where target_id = '1';

are also takes more than 1 minutes to result the return .

while query

select MAX(call_duration), MIN(call_duration) from xyz where target_id = '1';

Return results very fast, in seconds like 3-4 seconds.

table(measurements) xyz is very large. there are more than 10 million records that matches this where condition. call_duration and recipient_id are Fields while target_id is Tag

While MIN and MAX function returning results very fast, i suspect why SUM is taking more time?

InfluxDB version: 1.7.4 Machine Configuration - AWS EC2 - t2.medium (4 GB RAM)

Config file:

[meta]
dir = "/var/lib/influxdb/meta"
[data]
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"
series-id-set-cache-size = 100
[coordinator]
[retention]
[shard-precreation]
[monitor]
[http]
enabled = true
auth-enabled = true
log-enabled = true
access-log-path = "/var/log/influxdb/http_access.log"
write-tracing = false
max-body-size = 0
max-concurrent-write-limit = 0
[logging]
[subscriber]
[[graphite]]
[[collectd]]
[[opentsdb]]
[[udp]]
[continuous_queries]
[tls]

I think that result should returned in 5-6 seconds

EXPLAIN select MIN(call_duration) from xyz where target_id='1';
QUERY PLAN
----------
EXPRESSION: min(call_duration::float)
NUMBER OF SHARDS: 21
NUMBER OF SERIES: 85134
CACHED VALUES: 0
NUMBER OF FILES: 24642
NUMBER OF BLOCKS: 31608
SIZE OF BLOCKS: 65520278

EXPLAIN select MAX(call_duration) from xyz where target_id='1';
QUERY PLAN
----------
EXPRESSION: max(call_duration::float)
NUMBER OF SHARDS: 21
NUMBER OF SERIES: 85134
CACHED VALUES: 0
NUMBER OF FILES: 24642
NUMBER OF BLOCKS: 31608
SIZE OF BLOCKS: 65520278

EXPLAIN select SUM(call_duration) as total_call_duration from xyz where target_id='1';
QUERY PLAN
----------
EXPRESSION: sum(call_duration::float)
NUMBER OF SHARDS: 21
NUMBER OF SERIES: 85134
CACHED VALUES: 0
NUMBER OF FILES: 24642
NUMBER OF BLOCKS: 31608
SIZE OF BLOCKS: 65520278

EXPLAIN select Count(Distinct(recipient_id)) as total_recipients from xyz where target_id='1';
QUERY PLAN
----------
EXPRESSION: recipient_id::integer
NUMBER OF SHARDS: 21
NUMBER OF SERIES: 85134
CACHED VALUES: 0
NUMBER OF FILES: 24642
NUMBER OF BLOCKS: 31608
SIZE OF BLOCKS: 82448255
jkamani
  • 51
  • 3
  • Can you post results of each query with "EXPLAIN" prefix? – Hardik Sondagar Oct 16 '19 at 10:06
  • @HardikSondagar added Explain results – jkamani Oct 16 '19 at 10:28
  • It looks like InfluxDB (and any other TSDB) isn't a good fit for this task, since the data set doesn't contain clearly defined time series - it contains time-labeled events with various attributes. Analytical database like [ClickHouse](https://clickhouse.yandex) would give much better performance (100x and more) for this task. – valyala Oct 16 '19 at 11:21

0 Answers0