I have a SparkContext and a bunch of files that I want to access using the textFile method. The logic to find which files I need to access is complex and requires careful testing.
In my testing environment (using pytest), all the python files and files the SparkContext
needs to access are on my local machine. The problem I encounter is that creating a SparkContext
object in this testing environment and loading data using textFile
often fails with a timeout exceeding 1 minute (cf Mock a Spark RDD in the unit tests).
In my testing environment, how to test that the file patterns I send to textFile
are right, and therefore ensure that textFile
returns an rdd
from the right files?