0

I'm a newbie in Streamsets and Kudu technologies and I'm trying several solutions to reach my goal: I've got a folder containing some Avro files and these files need to be processed and afterward sent to a Kudu schema.

https://i.stack.imgur.com/l5Yf9.jpg

When using an Avro file containing a couple hundreds of records all goes right, but when the number of records increases to 16k this error is shown:

Caused by:
org.apache.kudu.client. NonRecoverableException:
MANUAL_FLUSH is enabled but the buffer is too big.

I've searched in all available configurations both on Streamsets and Kudu and the only solution that I was able to apply consists in editing the Java source code, deleting a single row that switched from the default flush mode to the manual one; this works but it's not the optimal solution because it requires to edit and compile this file each time I want to use it on a new machine.

Anyone knows how to avoid this happens?

Thanks in advance!

  • Did you try reducing your batch size? Otherwise, it could be https://issues.streamsets.com/browse/SDC-5877 – metadaddy May 10 '17 at 05:53
  • I wasn't able to find a similar setting in Streamsets and each record produces a big amount of data after the pivoting step.. this may be caused by the bug you've noticed to me: I'll observe how this evolves while mantaining the actual solution. Anyway, if anyone has a ready solution,please explain me how can I do. – Christian D'Amico May 10 '17 at 19:25
  • 1
    Fix (expose Kudu buffer size in config) will be in next minor release, around the end of this month. BTW - you can engage directory with the StreamSets community via Slack or Google Group: https://streamsets.com/community/ – metadaddy May 11 '17 at 19:02
  • @metadaddy I am facing the similar issue in my streamsets job. Even after changing the 'Mutation Buffer Space (records)' to 100, I am still facing this issue. Is this issue fixed? – Cast_A_Way Jun 12 '18 at 07:41
  • Try making it much bigger than 100! – metadaddy Jun 14 '18 at 22:55

0 Answers0