How do you unit test python files from a Databricks .ipynb without importing everything from the .ipynb file?
For example, I'm attempting to use unittest from VS Code on my desktop, where I've cloned the .ipynb file from my Azure Databricks instance.
Once I have this running locally, I have a simple unit test to read a CSV.
The issue is that when I attempt to load a --single function-- from the file I am testing (csv_to_parquet), the test attempts to load the entire file, which contains items not available locally. Specifically, NameError: name 'dbutils' is not defined
.
I have no use for dbutils in this unit test. It is simply reading that when it tries to load the csv_to_parquet.py file. How do I tell unit test to ignore that completely?
The only function being imported from the file I want to test is:
def readAndShowCSV(sFilePath = 'users.csv/users' ):
csvFile = spark.read.csv(mountPoint+loadPath+'/' + sFilePath, header=True, inferSchema=True)
csvFile.show(5)
return csvFile
So why is Dbutils being called at all?
import unittest
import pandas as pd
from csv_to_parquet import readAndShowCSV
# Inherits from unittest.TestCase
# Gives us access to testing capibilities
class TestParquet(unittest.TestCase):
def test_readAndShowCSV(self):
# Function lives in csv_to_parquet.py
csvFile = readAndShowCSV()
# Will run all of our tests
if __name__ == '__main__':
unittest.main()