What is resilient in RDD?

Asked May 27 '17 at 14:51

Active May 27 '17 at 15:04

Viewed 135 times

I came across two versions of Resilient meanining RDD ?

Understanding 1:- When RDD is created in memory, it also stores the algorithm in secondary storage how it created the RDD. So even if RDD is lost, it can be constructed later from algorithm. That's why it is called resilient/reliable. RDD just stores/backup the algorithm not the actual RDD data in secondary storage.

Understanding 2:- When RDD is created in memory, it backup th data on another node also. So even if RDD on one node is lost, it can be constructed later from data stored on another node.

Which one is correct ?

edited May 27 '17 at 15:04

asked May 27 '17 at 14:51

scott miles

1,511
2
21
36

1

I think the general answer here is "fault tolerance". Your two understandings go together – OneCricketeer May 27 '17 at 14:56
@cricket_007 When RDD is already backed up at another node(which is point 2 in my post), what's the need of storing algorithm . Doing Both looks redundant Is n't it ? I can understand backing up the algorithm on another node.. But what's the need to backup data ? – scott miles Jun 03 '17 at 11:14
The data isn't backed up. The DAG to regenerate the data is. That's why RDD is a lazy data structure – OneCricketeer Jun 03 '17 at 17:31

What is resilient in RDD?

0 Answers0