In Spark, the following code
rdd = sc.textfile("file path")
rdd1 = rdd.filter(filter1).maptopair();
rdd2 = rdd.filter(filter2).maptopair();
rdd3 = rdd1.join(rdd2);
rdd3.saveastextfile();
will generate 3 stages. From the Spark Web UI, I saw stage 1 and stage 2 are running parallel and join stage (stage 3) will be triggered after the first two are done. My question is both stage 1 and stage 2 read the same file at the same time ? That means Spark read the same file twice ?