1

What is the status of the indexedRDD work in Spark? Has anyone looked at SnappyData? They make some claims around being able to do fast random reads and writes on dataframes.

plamb
  • 5,636
  • 1
  • 18
  • 31

1 Answers1

1

Here is the Amplab work on IndexedRdd. There are no commits to this project since Sept 2015 and seems like the approach required re-scans of the entire RDD to construct a new one on each update. See here for how state management will be addressed in a future version of Spark (likely Spark 2.0). This relies on checkpointing RDD state at configured intervals. But, it is more advisable to consider a third party data store for random RW like Cassandra, GemFire, Redis, etc. SnappyData, an in-memory SQL datastore, is also in this camp but also permits the data store to run embedded within spark executors avoiding serialization/deserialization issues.

plamb
  • 5,636
  • 1
  • 18
  • 31
jagsr
  • 535
  • 2
  • 6