I see Kafka Connect can write to S3 in Avro or JSON formats. But there is no Parquet support. How hard would this be to add?
Asked
Active
Viewed 5,669 times
6
-
Added! See: https://twitter.com/karantasis/status/1181701302608285698?s=19 and: https://github.com/confluentinc/kafka-connect-storage-cloud/pull/241 – Aaron_ab Oct 10 '19 at 08:43
-
3Parquet support is now available as part of the 5.4 release of Kafka Connect S3 sink – Robin Moffatt Jan 22 '20 at 11:55
-
Yes you can! I wrote an example [here](https://stackoverflow.com/a/73926873/9192415) – simple_developer Oct 02 '22 at 17:27
3 Answers
5
Starting with Confluent 5.4.0, there is official support for Parquet output to S3.

clay
- 18,138
- 28
- 107
- 192
2
The Qubole connector supports writing out parquet - https://github.com/qubole/streamx

diomedes01
- 106
- 1
- 6
1
Try secor
:
https://github.com/pinterest/secor
Can work with AWS S3, google cloud, Azure's blob storage etc.
Note that the solution you choose must have key features like: Guarantee writing each message exactly once, load distribution, fault tolerance, monitoring, partitioning data etc.
Secor
has it all and as stated above, can easily work with other "s3" style services..

Aaron_ab
- 3,450
- 3
- 28
- 42