Testing file matching in pyspark

Asked Oct 20 '21 at 13:57

Active Oct 20 '21 at 13:57

Viewed 45 times

I have a SparkContext and a bunch of files that I want to access using the textFile method. The logic to find which files I need to access is complex and requires careful testing.

In my testing environment (using pytest), all the python files and files the SparkContext needs to access are on my local machine. The problem I encounter is that creating a SparkContext object in this testing environment and loading data using textFile often fails with a timeout exceeding 1 minute (cf Mock a Spark RDD in the unit tests).

In my testing environment, how to test that the file patterns I send to textFile are right, and therefore ensure that textFile returns an rdd from the right files?

asked Oct 20 '21 at 13:57

Brainless

1,522
1
16
30

Testing file matching in pyspark

0 Answers0