as far as i know, both platform supports big data ingestion(streaming).
What are the advantages and disadvantages of each platform?
as far as i know, both platform supports big data ingestion(streaming).
What are the advantages and disadvantages of each platform?
Arrow Flight consists of a serialization format for Arrow over gRPC. It requires two applications, one client, and one server. The server must be running for the client to send it messages.
Apache Kafka is a distributed, persistent, temporal log. It requires 4 components - Zookeeper, Kafka broker, the producer application, and a consumer application. The producer and consumer are decoupled and need not be running at the same time. Zookeeper and the broker must always be available for a healthy system
With Flight, you have point-to-point client server interactions between applications.
With Kafka, applications interact with middleware of the brokers only, not one another.
In theory, one could write an Arrow serializer for Kafka, however I would think using row-oriented formats such as Thrift, Protobuf, Avro make more sense over the network than the popular analytic, columnar formats like Arrow, ORC, Parquet
Neither system is necessarily required for large data sets. In fact, I'm not sure Arrow scales any better than any other gRPC based architecture
The driving force being Kafka is to reduce the point to point application interaction