0

I came across two versions of Resilient meanining RDD ?

Understanding 1:- When RDD is created in memory, it also stores the algorithm in secondary storage how it created the RDD. So even if RDD is lost, it can be constructed later from algorithm. That's why it is called resilient/reliable. RDD just stores/backup the algorithm not the actual RDD data in secondary storage.

Understanding 2:- When RDD is created in memory, it backup th data on another node also. So even if RDD on one node is lost, it can be constructed later from data stored on another node.

Which one is correct ?

scott miles
  • 1,511
  • 2
  • 21
  • 36
  • 1
    I think the general answer here is "fault tolerance". Your two understandings go together – OneCricketeer May 27 '17 at 14:56
  • @cricket_007 When RDD is already backed up at another node(which is point 2 in my post), what's the need of storing algorithm . Doing Both looks redundant Is n't it ? I can understand backing up the algorithm on another node.. But what's the need to backup data ? – scott miles Jun 03 '17 at 11:14
  • The data isn't backed up. The DAG to regenerate the data is. That's why RDD is a lazy data structure – OneCricketeer Jun 03 '17 at 17:31

0 Answers0