4

Our task is to run a batch of ~20000 queries in ClickHouse and store results into a CSV file. Each query is a count() aggregation returning a single number.

We do it like this:

./generate_queries.js | clickhouse-client --multiquery | tr '\n' ',' >> metrics.csv

(Yes, trailing comma, we'll fix that.)

Query example:

SELECT count(*) FROM merged_data WHERE business_type = 22;

The problem is that if a query matches zero records, ClickHouse simply returns nothing, and the number of records in resulting CSV file is different from the number of queries.

This might be a standard behaviour for SQL, but how can we work that around and make ClickHouse count() return 0 in case of zero matches?

weekens
  • 8,064
  • 6
  • 45
  • 62

3 Answers3

3

For now you can do this:

SELECT
    count() - 1
FROM (
    SELECT
        business_type
    FROM
        merged_data
    WHERE
        business_type = 22
    UNION ALL
        toUInt64(1)
)

Just replace toUInt64 with whatever the type of business_type is.

Developers are aware of the problem and are working on it: https://github.com/yandex/ClickHouse/issues/51 https://groups.google.com/forum/#!topic/clickhouse/2JS_yzvYAHM

Current plan is to add an ability to control this behaviour from settings.

Anton Gromov
  • 185
  • 7
1

We managed to work this around with JSON output format.

SELECT count(*) FROM merged_data WHERE business_type = 22 FORMAT JSONCompact;

The DB response in this case looks like:

{
    "meta":
    [
        {
            "name": "count()",
            "type": "UInt64"
        }
    ],

    "data":
    [

    ],

    "rows": 0,

    "statistics":
    {
        "elapsed": 0.044646461,
        "rows_read": 53413865,
        "bytes_read": 53413865
    }
}

There is always a response, even in case of empty match (we see "rows": 0 if this is the case).

The JSON response stream parser looks like:

var readLine = require('readline');

var rl = readLine.createInterface({
  input: process.stdin,
  output: process.stdout,
  terminal: false
});

var buf = '';

rl.on('line', line => {
  buf += line;

  if (line == '}') {
    // End of JSON => process.
    var json = JSON.parse(buf);
    buf = '';

    if (json.rows === 0) {
      console.log('0');
    }
    else {
      console.log(json.data[0][0]);
    }
  }
});
weekens
  • 8,064
  • 6
  • 45
  • 62
0

This is a known issue with ClickHouse. To workaround the issue, follow these steps.

Suppose that your count query is something like this:

SELECT count(*) AS count
FROM mytable

Put your query into this sql template:

SELECT *
FROM 
(
    -- put your count query here 
    UNION ALL 
    SELECT toUInt64(0)
) 
LIMIT 1

So the final query will be this:

SELECT *
FROM 
(
    SELECT count(*) AS count
    FROM mytable
    UNION ALL 
    SELECT toUInt64 (0)
) 
LIMIT 1

NOTE: There is no performance overhead for using this workaround.

Lino
  • 19,604
  • 6
  • 47
  • 65
ramazan polat
  • 7,111
  • 1
  • 48
  • 76