1

I'm new to pysaprk, so I have a function and I've written unit test for it, and I have defined a UDF function by using this function for pyspark, something like:

udf_my_function = udf(lambda s: my_function(s), StringType())

My question is if I already have the unit test for my_function(), do I need a unit test for udf_my_function? If so, how can I write it? Any relevant articles or links will also be appreciated? Many thanks.

wawawa
  • 2,835
  • 6
  • 44
  • 105

1 Answers1

0

from my personal opinion, it's not strictly necessary. But sometimes it's still desirable to have the test as part of the testing suite that is doing data transformations. Usually it will have form of:

sourceDf = .... # read data from somewhere, or define in test
resultDf = sourceDf.withColumn("result", udf_my_function(col("some_column")))
assertEqual(resultDf, expectedDf)

There are several libraries available for writing unit tests for PySpark:

you can also use pytest-spark to simplify the maintenance of the Spark parameters, include 3rd-party packages, etc.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Hi thanks for the answer, I don't want to actually setup a spark session in the unit test, can any of the test libraries mock it? – wawawa Apr 25 '21 at 13:49
  • not in these libraries. But you can do like this: https://stackoverflow.com/questions/58666424/how-to-mock-inner-call-to-pyspark-sql-function – Alex Ott Apr 25 '21 at 14:04