1

I have a function which throws an exception when the max of column A is equal to a number (say 5). I want to unittest this function to check if it throws the Exception.

main.py

import pandas as pd

class DuplicateRunError(Exception):
    def __init__(self, value):
        self.value = value

def pd_max(df1):
    max_a = df1['A'].max()
    if max_a == 5:
        raise DuplicateRunError("Max Value for A reached")
    else:
        return "All Good"

if __name__ == '__main__':
    print(pd_max(pd.read_csv("file1.csv")))

I created a unittest for this function like below.

main_test.py

import unittest
from unittest import mock
import pandas as pd

class TestRaiseException(unittest.TestCase):
    @mock.patch('df1["A"].max()')
    def test_pd_max(self, mock_max_a):
        mock_max_a.return_value = 5
        with self.assertRaises(DuplicateRunError):
            pd_max(pd.read_csv("file1.csv"))

if __name__ == '__main__':
    unittest.main()

But I get an error ModuleNotFoundError: No module named 'df1["A"]' I want to mock the value of df1["A"].max()

What is missing here? What is the best way to set the value for df1["A"].max() I think I could get it working if i mock the dataframe object by passing a dict and then passing it to the function. But I want to know if there is way to directly set the value as 5 for df1["A"].max()

Ashok KS
  • 659
  • 5
  • 21

1 Answers1

0

There are other ways to do this. Since df is an input to your function, you can actually mock the specific functionality you need

class TestRaiseException(unittest.TestCase):

    @mock.patch('pandas.DataFrame')
    def test_pd_max(self, mock_df):
        mock_df.max.return_value = 5

        with self.assertRaises(DuplicateRunError):
            pd_max(mock_df)

Another similar idea is to use MagicMock directly on the specific max method. This will keep the behavior for everything else. Using @mock.patch will mock other things beyond .max scope

class TestRaiseException(unittest.TestCase):

    def test_pd_max(self):
        actual_df = pd.read_csv(...)
        actual_df.max = MagicMock(return_value=5)

        with self.assertRaises(DuplicateRunError):
            pd_max(actual_df)

You can also encapsulate the specific call/logic into your own method and mock that. This is safer of side-effects

def pd_max(df1):
    max_a = self.get_max_val(df1)
    if max_a == 5:
        raise DuplicateRunError("Max Value for A reached")
    else:
        return "All Good"

def get_max_val(df1):
    return df1['A'].max()

class TestRaiseException(unittest.TestCase):

    @mock.patch('path.to.call.YourClass.get_max_val')
    def test_pd_max(self, mocked_method):
        mocked_method.return_value = 5

        with self.assertRaises(DuplicateRunError):
            pd_max(some_other_df) # you can pass an actual df in here

The first and second methods are OK if you don't rely on other .max() calls. However, it is hard to future-proof what other engineers might do with the code. E.g., if someone else adds functionalities relying on .max() throughout the method in the future, your test might be impacted in an unexpected way.

The last method encapsulates and focuses the mock only on your own method, keeping the pd.DataFrame real.

rafaelc
  • 57,686
  • 15
  • 58
  • 82
  • I tried both the first and second method but getting error as `AssertionError: DuplicateRunError not raised` – Ashok KS May 16 '23 at 06:03
  • @AshokKS Works on my end. You're probably missing something – rafaelc May 16 '23 at 13:40
  • I even tried printing the value of max_a in main.py and it prints as `` Ideally it should have printed the value 5 right? I didn't change anything other than what you have suggested. – Ashok KS May 16 '23 at 23:53