I'm planning on joining two topics as KStreams over a long window (~1week). Assuming there will be hundreds of millions of records accumulated in this window, how long will the joining consumer take to restart? I'm asking this because I was unable to find the information regarding how many of the records from the window are stored in the consumer cache.
Asked
Active
Viewed 155 times
1 Answers
2
By default, data that is buffered in a window is stored in RocksDB, ie, local disk. Hence, on restart (on the same machine) nothing needs to be re-loaded as the data is already available.
If you restart on a different machine, the whole content of the store would need to be re-read from a Kafka topic (that backs up the store to guarantee fault-tolerance). How long this takes depends on many factors and it's hard to estimate. You can register a "restore callback" though to monitor the restore process. This should give you some way to run some experiments to get insight how long it may take.

Matthias J. Sax
- 59,682
- 7
- 117
- 137
-
Is data from all Kafka topic partitions being stored locally in case a different partition is assigned? – Atom Jan 09 '20 at 09:08
-
1Only data from the assigned partitions is store locally. If a partition is "moved away", the local store will be deleted after a configurable time (default is something like 10 minutes) – Matthias J. Sax Jan 10 '20 at 03:54