6

I see Kafka Connect can write to S3 in Avro or JSON formats. But there is no Parquet support. How hard would this be to add?

Aaron_ab
  • 3,450
  • 3
  • 28
  • 42
clay
  • 18,138
  • 28
  • 107
  • 192

3 Answers3

5

Starting with Confluent 5.4.0, there is official support for Parquet output to S3.

clay
  • 18,138
  • 28
  • 107
  • 192
2

The Qubole connector supports writing out parquet - https://github.com/qubole/streamx

diomedes01
  • 106
  • 1
  • 6
1

Try secor: https://github.com/pinterest/secor

Can work with AWS S3, google cloud, Azure's blob storage etc.

Note that the solution you choose must have key features like: Guarantee writing each message exactly once, load distribution, fault tolerance, monitoring, partitioning data etc.

Secor has it all and as stated above, can easily work with other "s3" style services..

Aaron_ab
  • 3,450
  • 3
  • 28
  • 42