I am currently working on a project which needs to ingest data from a Kafka Topic (JSON format), and write it directly into Clickhouse. I followed the method as suggested in the Clickhouse documentation:
Step 1: Created a clickhouse consumer which writes into a table (say, level1).
Step 2: I performed a select query on 'level1' and it gives me a set of results, but is not particularly useful as it can be read only once.
Step 3: I created a materialised view that converts data from the engine(level1) and puts it into a previously created table (say, level2). While writing into 'level2' the aggregation is on a day level (done by converting timestamp in level1 to datetime).
Therefore, data in 'level2' :- day + all columns in 'level1'
I intend to use this view (level2) as the base for any future aggregation (say, at level3)
Problem 1: 'level2' is being created but data is not being populated in it, i.e., when I perform a basic select query (select * from level2 limit 10) on the view, the output is "0 rows in set".
Is it because of day level aggregation, and it might populate at the end of the day? Can I ingest data from 'level2' in real-time?
Problem 2: Is there a way of reading the same data from my engine 'level1', multiple times?
Problem 3: Is there a way to convert Avro to JSON while reading from a kafka topic? Or can Clickhouse write data (in Avro format) directly into 'level1' without any conversion?
EDIT: There is latency in Clickhouse while retrieving data from Kafka. Had to make changes in the user.xml file in my Clickhouse server (change max_block_size).