0

Is it possible to keep a database in spark where executors write and read from it for a batch of data and then it clears the database to start again for the next batch.

Is this fast if we are talking for 100k entries per batch? which database should I use for a beginner?

1 Answers1

0

Yes it is possible to store data of every batch.

Streaming Sink — Adding Batches of Data to Storage

Spark Streaming - obtain batch-level performance stats

You can store data in HDFS,NO-SQL databases like Cassandra, HBase,MongoDB. Choosing any particular data storage target(database in your case), depends on your business problem.

There is a trade-off for using each and every database in performance retrieval rate, processing rate, storage space etc.comparison

devesh
  • 618
  • 6
  • 26