Lets say we create an RDD from alluxio memory
rdd1 = sc.textFile("alluxio://.../file1.txt")
rdd2 = rdd1.map(...)
Does rdd2
reside on alluxio
or on spark
's heap.
Also would an operation like (both pairRDD's on alluxio)
pairRDD1.join(pairRDD2)
create a new RDD on alluxio or on spark heap.
The reason for the second question is that I need to join 2 large RDD's both on alluxio. Would the join use alluxio's memory or would the RDD's get pulled into spark memory for the join (and where would the resulting RDD reside).