Is a static file loaded and unloaded for each batch operation in Spark?

Question

I am using Spark to perform some operations on my data. I need use a auxiliary dictionary to help my data operations.

streamData = sc.textFile("path/to/stream")
dict = sc.textFile("path/to/static/file")
//some logic like:
//if(streamData["field"] exists in dict)
// do something

My question is: is the dict in memory all the time or does it need to be loaded and unloaded each time Spark is working on a batch?

Thanks

score -1 · Answer 1 · edited May 23 '17 at 11:54

-1

The dict will remain persisted in memory unless it needs to be evicted for another object(s) that needs the memory at runtime. If you need to reuse it later, you should do dict.cache() after initializing it. You can also persist the RDD to disk with .persist(DISK_ONLY) if it's very large and untenable for caching in memory. This post has a useful summary on RDD mechanics.

edited May 23 '17 at 11:54

Community

1
1

answered Mar 28 '17 at 00:21

Paul Back

1,269
16
23

Is a static file loaded and unloaded for each batch operation in Spark?

1 Answers1