Here are method to send batches to ClickHouse. Batch size = 50, all data to send = 1000 objects.
My method
void executeBatch(final List<V> batch) throws ClickHouseException {
if (!batch.isEmpty()) {
try (ClickHouseClient client = ClickHouseClient.newInstance(ClickHouseProtocol.HTTP)) {
ClickHouseRequest.Mutation insertRequest = client.read(cluster).write().table(tableName)
.format(ClickHouseFormat.RowBinary);
ClickHouseConfig config = insertRequest.getConfig();
try (final ClickHousePipedOutputStream stream = ClickHouseDataStreamFactory.getInstance()
.createPipedOutputStream(config, (Runnable) null)) {
insertRequest.data(stream.getInputStream())
.execute()
.thenAccept(response ->
LOGGER.info("Batch: {}, Written rows: {}", batch.size(),
response.getSummary().getWrittenRows()));
for (final V event : batch) {
BinaryStreamUtils.writeString(stream, event.get("id").getAsString());
BinaryStreamUtils.writeNonNull(stream);
BinaryStreamUtils.writeString(stream, event.get("name").getAsString());
}
}
} catch (final IOException e) {
throw ClickHouseException.of(e, cluster.getTemplate());
}
}
}
I see in logs
Batch: 50, Written rows: 50
While sending to ClickHouse it doesn't throw any exceptions.
But there are not all 20 log rows, every try its different and in the table are not all rows, just according to logs.
In the table system.errors
are several rows:
- Cannot parse input: expected '\r\n' at end of stream.
- Unexpected end of file while reading chunk header of HTTP chunked data
- Too many parts (300). Merges are processing significantly slower than inserts
- Directory /var/lib/clickhouse/store/1ae/1ae1dd8f-736f-4a7d9f15608a194c56af/tmp_merge_202307_178_275_1/ already exists
No matter the time I wait the destination table doesn't have all 1000 rows.
What I do wrong?