0

I understand Kiba's core is to process rows 1 by 1. And I want this, until the destination step. I want to push the transformed data to a Kafka Topic, however it's preferred to do it in Bulk rather than individually. Is this possible?

Assumming we have the next class

class TransactionProducer
  def intialize(data: [])
    @data = data
  end

  def push_to_kafka
    $kafka.push(data)
  end
end

I think this is possible using post_process and storing the transformed data in an array.

data = []

job = Kiba.parse do
  source MySource, source_config

  transform do |row|
    row = Transform...
    data << row
  end

  post_process do
    TransactionProducer.new(data).push_to_kafka
  end
end

But im wondering if there's another way?

1 Answers1

0

While it is possible to use a post_process for this, I would instead recommend to leverage the fact that destinations can implement a close method (see https://github.com/thbar/kiba/wiki/Implementing-ETL-destinations), and use that to "buffer out" to your target (a bit like shown for the aggregating transform here https://thibautbarrere.com/2020/03/05/new-in-kiba-etl-v3).

If you have a large number of rows, you can also use both write and close to flush out your buffer as soon as you reach a given number of rows (but make sure to flush everything that is remaining in your close call).

Hope this helps!

Thibaut Barrère
  • 8,845
  • 2
  • 22
  • 27