0

I would like to implement an apache flink trigger that will fire when the state accumulates 256MB. I would like to do this because my sink is writing parquet files to hdfs and i would like to run ETL on them later, which means I don’t want too small or too large files, and my source(apache kafka topic) is changing in volume constantly.

I didn’t find a way to do it. I found some the StateObject interface that have the size() function. Didn’t find a way to use it.

Ronmeir
  • 23
  • 4

1 Answers1

1

I would use a Flink FileSink with the Parquet bulk format, and have a rolling policy that constrains the file size, but rolls based on your maximum allowable latency.

kkrugler
  • 8,145
  • 6
  • 24
  • 18