2

I am currently working on a project which needs to ingest data from a Kafka Topic (JSON format), and write it directly into Clickhouse. I followed the method as suggested in the Clickhouse documentation:

Step 1: Created a clickhouse consumer which writes into a table (say, level1).

Step 2: I performed a select query on 'level1' and it gives me a set of results, but is not particularly useful as it can be read only once.

Step 3: I created a materialised view that converts data from the engine(level1) and puts it into a previously created table (say, level2). While writing into 'level2' the aggregation is on a day level (done by converting timestamp in level1 to datetime).

Therefore, data in 'level2' :- day + all columns in 'level1'

I intend to use this view (level2) as the base for any future aggregation (say, at level3)

Problem 1: 'level2' is being created but data is not being populated in it, i.e., when I perform a basic select query (select * from level2 limit 10) on the view, the output is "0 rows in set".

Is it because of day level aggregation, and it might populate at the end of the day? Can I ingest data from 'level2' in real-time?

Problem 2: Is there a way of reading the same data from my engine 'level1', multiple times?

Problem 3: Is there a way to convert Avro to JSON while reading from a kafka topic? Or can Clickhouse write data (in Avro format) directly into 'level1' without any conversion?

EDIT: There is latency in Clickhouse while retrieving data from Kafka. Had to make changes in the user.xml file in my Clickhouse server (change max_block_size).

Prithu Srinivas
  • 245
  • 1
  • 3
  • 9

1 Answers1

0

Problem 1: 'level2' is being created but data is not being populated in it, i.e., when I perform a basic select query (select * from level2 limit 10) on the view, the output is "0 rows in set".

This might be related to the default settings of kafka storage, which always starts consuming data from the latest offset. You can change the behavior by adding this

<kafka>
    <auto_offset_reset>earliest</auto_offset_reset>
</kafka>

to config.xml

Problem 2: Is there a way of reading the same data from my engine 'level1', multiple times?

You'd better avoid reading from kafka storage directly. You can set up a dedicated materialized view M1 for 'level1' and use that to populate 'level2' too. Then reading from M1 is repeatable.

Problem 3: Is there a way to convert Avro to JSON while reading from a kafka topic? Or can Clickhouse write data (in Avro format) directly into 'level1' without any conversion?

Nope, though you can try using Cap'n Proto which should provide similar performance like Avro, and it's supported directly by ClickHouse.

Amos
  • 3,238
  • 4
  • 19
  • 41