0

In order to debug a production problem, I am running Samza code locally using ProcessJobFactory. Everything appears to run fine.

The code uses a Samza key/value store backed by RocksDB and Kafka as a changelog (Kafka running on a different machine in case that matters).

In order to populate the environment with real data to debug, I replayed live data into the Kafka changelog for the key/value store for the RocksDB database with the Samza job stopped.

Upon starting Samza, it does not resync the RocksDB database with the Kafka changelog. I verified this using Keylord (tool) and looking at the contents of the RocksDB database directly.

How can Samza be forced to resync the RocksDB database (key/value store) with the changelog? Is there a config setting or a code level call that can be made?

Related - I assume when the code does a key-value-store.all(); that even if the cache in the code is empty, it will go to RocksDB and pull "all entries" from there?

Thanks,

drobin
  • 286
  • 2
  • 6

1 Answers1

1

Have you tried deleting the store directory where the samza job hosts its RocksDB stores? It'd be under the job.logged.store.base.dir you have configured https://samza.apache.org/learn/documentation/latest/jobs/configuration-table.html , defaulting to user.dir environment property

  • Yes, that was it. Thank you. I deleted the rocksdb directory that contained the Partitions and all the data files related to the store. After restarting the samza job (org.apache.samza.job.local.ProcessJobFactory in this debug environment) the rocksdb was recreated with the correct size and data which could then be seen in the Samza application. – drobin Feb 25 '20 at 21:43