1

How do you unit test python files from a Databricks .ipynb without importing everything from the .ipynb file?

For example, I'm attempting to use unittest from VS Code on my desktop, where I've cloned the .ipynb file from my Azure Databricks instance.

Once I have this running locally, I have a simple unit test to read a CSV.

The issue is that when I attempt to load a --single function-- from the file I am testing (csv_to_parquet), the test attempts to load the entire file, which contains items not available locally. Specifically, NameError: name 'dbutils' is not defined.

I have no use for dbutils in this unit test. It is simply reading that when it tries to load the csv_to_parquet.py file. How do I tell unit test to ignore that completely?

The only function being imported from the file I want to test is:

def readAndShowCSV(sFilePath = 'users.csv/users' ):
  csvFile = spark.read.csv(mountPoint+loadPath+'/' + sFilePath, header=True, inferSchema=True)
  csvFile.show(5)

  return csvFile

So why is Dbutils being called at all?

import unittest
import pandas as pd
from csv_to_parquet import readAndShowCSV

# Inherits from unittest.TestCase
# Gives us access to testing capibilities 
class TestParquet(unittest.TestCase):

    def test_readAndShowCSV(self): 

        # Function lives in csv_to_parquet.py
        csvFile = readAndShowCSV() 


# Will run all of our tests
if __name__ == '__main__':
    unittest.main()
Dave Voyles
  • 4,495
  • 7
  • 33
  • 44

3 Answers3

0

I believed that "from [class] import [function]" would ONLY import a function, but that is not the case. It imports the entire class.

One of the functions within the class was using dbutils.

User simon_dmorais recommended:

I would consider using databricks-connect to do this (it will be slow for unit tests). Or remove all dbutils references from that module (or sub modules that it imports).

Dave Voyles
  • 4,495
  • 7
  • 33
  • 44
0

For unit testing in a none databricks environment you could mock dbutils. Here is an example of this: https://github.com/jugi92/dbutilsMock

from typing import Dict
from unittest.mock import MagicMock

class DbutilsMock():
    """Simple Mock for dbutils functions that can be used whenever dbutils is not available, e.g. for unittesting databricks notebooks locally
    
    Use in the following way:
    Before your test initiate the dbutils Mock:
    ```
    from dbutilsmock import DbutilsMock
    dbutils = DbutilsMock(
        widgets_dict={
            "input_path": "/test/asd",
            "out_path": "/out/test"
        },
        secrets_dict={
            "my_scope": {
                "my_key": "the_real_secret"
            }
        }
    )
    ```
    Then in your test code the following code should work:
    ```
    >>>dbutils.widgets.text(name="widget_name", defaultValue="defaultWidgetValue", label="WidgetLabel")
    >>>dbutils.widgets.get("input_path")
    '/test/asd'
    >>>dbutils.secrets.get("my_scope", "my_key")
    'the_real_secret'
    ```
    """
    widgets = MagicMock()
    secrets = MagicMock()

    def __init__(self, widgets_dict: Dict=None, secrets_dict: Dict=None):
        self.widgets.text = MagicMock(return_value=None)
        
        if widgets_dict:
            self.widgets._widgets_dict = widgets_dict
            self.widgets.get = self._dbutils_widgets_get
            
        if secrets_dict:
            self.secrets._secrets_dict = secrets_dict
            self.secrets.get = self._dbutils_secrets_get
    
    def _dbutils_widgets_get(self, text):
        if self.widgets._widgets_dict:
            return self.widgets._widgets_dict[text]
        else: 
            return text

    def _dbutils_secrets_get(self, scope, key):
        if self.secrets._secrets_dict:
            return self.secrets._secrets_dict[scope][key]
        else:
            return f"{scope}_{key}"
jugi
  • 622
  • 7
  • 15
-1

Dbutils is the package from Databricks which stands for Databricks utility. You have 2 options: 1. Either remove it from your source file. 2. Install the pypy form of Dbutils from this link: https://pypi.org/project/DBUtils/

code.gsoni
  • 695
  • 3
  • 12
  • I understand that it is part of Databricks utility, but for this particular test, it shouldn't be called/used at all. That's what is throwing me through a loop. The function which is being called doesn't use DButils at all, so I'm not sure why it is being called. – Dave Voyles Apr 15 '20 at 18:09
  • Does this line is working fine: from csv_to_parquet import readAndShowCSV because if the function is imported successfully then it should not throw error. – code.gsoni Apr 15 '20 at 18:22
  • What else is in file csv_to_parquet? I suspect you have something outside the function (or any function) that calls dbutils, or an import in it that calls dbutils. I would consider using databricks-connect to do this (it will be slow for unit tests). Or remove all dbutils references from that module (or sub modules that it imports). – simon_dmorias Apr 15 '20 at 19:31
  • @simon_dmorias you raise a good point, and I later learned what the issue was. I believed that "from import " would ONLY import a function, but that is not the case. It imports the entire class. Like you said, one of the functions within the class was using dbutils. I'll look into databricks connect now, thank you! – Dave Voyles Apr 16 '20 at 14:52
  • 1
    That pypi link is not the databricks dbutils – Shaun Ryan Dec 22 '21 at 19:26