1

I am currently using VoltDB kafka importer to import data from multiple kafka topics. I am facing performance issue with the loader. I read the VoltDB documentation but unable to find how to fine tune the importer.

How can specify specific partition of topic?

My current setup

6 nodes of VoltDB cluster and Kafka importers on the nodes with custom procedure for insert.

Kafka importer config

Host: 172.x.x.x:9092
Topic: mytopic_1,mytopic_2,...mytopic_10
Procedure: tinsert

Create procedure tinsert INSERT INTO tinsert (sensor_id, column2, column3, received_time) VALUES (?, ?, ?,now());

Table is partitioned and partition key is sensor_id

The problem is importer is not pulling data as fast as it is generated.

Message publication rate is 10,000 records per sec

Any help would be appreciated.

Arjun
  • 41
  • 4

1 Answers1

1

There are a few things you could adjust that would affect the rate that Kafka Importer can ingest data into VoltDB.

  1. The number of partitions for the topic in Kafka. VoltDB will run a consumer thread for each partition. More partitions = more threads.

  2. The VoltDB importer retrieves a batch of records from Kafka from its current offset, then calls the procedure for each record. It waits for the procedure callbacks to return so it knows everything was processed. Then it advances the offset in Kafka and retrieves another batch. This process may be limiting the rate it can handle. If you set the property commit.policy=10, then it would just advance the offset to whatever it has read every 10 milliseconds. That may allow faster data flow, at the risk of potentially having a small gap if there is a failure and restart (e.g. the offset advanced beyond records that were read but not inserted).

For the configuration options, see: https://docs.voltdb.com/UsingVoltDB/exportimportkafka.php

  1. The performance / scale of the procedure being called on the cluster. If the table is partitioned, and the procedure is only inserting, it probably isn't the limiting factor.

Disclosure: I work at VoltDB

BenjaminBallard
  • 1,482
  • 12
  • 11
  • Thanks for the help. I was able to improve the performance by using same partitioning on producer. I created partitioned procedure and partition key is same column as table partition key. – Arjun Sep 29 '18 at 05:16