We have a kstreams app doing kstream-kstable inner join. Both the topics are high volume with 256 partitions each. kstreams App is deployed on 8 nodes with 8 GB heap each right now. The state store (rocksdb) persists to disk and we are running out of disk space on the containers. What are some of the options to consume data from one of the topics as KTABLE, but limit the amount of data (like if we want to hold only a days worth of keys/data or some time frame) on disk and have the previous state/files get deleted?
Asked
Active
Viewed 113 times
0
-
You can try and make the backing topics `cleanup.policy=delete,compact` if you only want recent data – OneCricketeer May 30 '21 at 18:22
-
will that also delete the data persisted by the state store related to the ktable on disk? – user2221654 May 31 '21 at 19:23
-
Not completely sure. My assumption would be no – OneCricketeer May 31 '21 at 19:30
-
okay, so thats where I am confused, not sure what happens to the data thats not relevant anymore after its spilled to disk as part of the state store. – user2221654 May 31 '21 at 20:26
-
may be I have to consider consuming as stream and join with windowing instead of reading one of the topics as ktable , assuming windowing would work for my use case – user2221654 May 31 '21 at 21:03