I have a utility function written in Python to write parquet files and json files to s3 bucket. This is the function:
def write_to_s3(data1, data2, s3_path):
try:
data1.write.mode("overwrite").parquet(s3_path)
data2.write.mode("overwrite").json(s3_path, compression="gzip")
except Exception as err:
logging.error(err)
raise
I'm still learning unit test, just wondering if there's a way to mock spark session to avoid setup a real one in the unit tests? Could someone help me in writing unit test cases for this please. I found a similar question but it's for Scala and it needs to set up a Spark session and I thought there is a way to mock it like we can mock s3? Hope this makes sense, thanks.
Update: I have followed this page that @Mauro Baraldi recommended below, that approach works, but it only look at the write
operation which been called one, how I can test the parquet
& json
part to make sure the data is written in s3 with the expected format? Thanks.