Runtime
YARN cluster mode
Application
- Spark structured streaming
- Read data from Kafka topic
About Kafka topic
- 1 topic with 4 partitions -for now. (number of partitions can be changed)
- Added 2000 records maximum in topic per 1 second.
I've found out that the number of Kafka topic partitions is matched with the number of spark executors (1:1).
So, in my case, what I know until now, 4 spark executors is the solution I think.
But I'm worried about data throughput - can be ensured 2000 rec/sec?
Is there any guidance or recommendation about setting proper configuration in spark structured streaming?
Especially spark.executor.cores
, spark.executor.instances
or something about executor.