32

I am developing a set of python scripts to pre-process a dataset then produce a series of machine learning models using scikit-learn. I would like to develop a set of unittests to check the data pre-processing functions, and would like to be able to use a small test pandas dataframe for which I can determine the answers for and use it in assert statements.

I cannot seem to get it to load the dataframe and to pass it to the unit tests using self. My code looks something like this;

def setUp(self):
    TEST_INPUT_DIR = 'data/'
    test_file_name =  'testdata.csv'
    try:
        data = pd.read_csv(INPUT_DIR + test_file_name,
            sep = ',',
            header = 0)
    except IOError:
        print 'cannot open file'
    self.fixture = data

def tearDown(self):
    del self.fixture

def test1(self):    
    self.assertEqual(somefunction(self.fixture), somevalue)

if __name__ == '__main__':
    unittest.main()

Thanks for the help.

tjb305
  • 2,580
  • 4
  • 15
  • 20
  • 1
    What do you mean "you cannot get it to"? Is there an error? If so, what is the error? What do you want to happen, and what happens instead? – BrenBarn Jan 14 '15 at 19:30
  • I do not get an error, the test runs successfully whatever I put in the test. What I want to be able to do is produce tests which test functions that manipulate a pandas dataframe and confirm their behaviour using a small test dataframe. – tjb305 Jan 14 '15 at 19:42
  • You'll need to show an actual example with actual data that isn't working. – BrenBarn Jan 14 '15 at 19:50
  • 1
    When you are using `self`, you have to put these functions inside a class. – joris Jan 15 '15 at 08:52
  • Thanks for the help, I've tried to get an example to work with an embedded data frame but with no luck. I will tomorrow try building the class to see if that fixes the problem. – tjb305 Jan 15 '15 at 17:49

3 Answers3

37

Pandas has some utilities for testing.

import unittest
import pandas as pd
from pandas.util.testing import assert_frame_equal # <-- for testing dataframes

class DFTests(unittest.TestCase):

    """ class for running unittests """

    def setUp(self):
        """ Your setUp """
        TEST_INPUT_DIR = 'data/'
        test_file_name =  'testdata.csv'
        try:
            data = pd.read_csv(INPUT_DIR + test_file_name,
                sep = ',',
                header = 0)
        except IOError:
            print 'cannot open file'
        self.fixture = data

    def test_dataFrame_constructedAsExpected(self):
        """ Test that the dataframe read in equals what you expect"""
        foo = pd.DataFrame()
        assert_frame_equal(self.fixture, foo)
Adam Slack
  • 518
  • 6
  • 11
  • 7
    Import from `pandas.testing` in the latest pandas - see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.testing.assert_frame_equal.html – snark Mar 04 '20 at 14:56
33

If you are using latest pandas, I think the following way is a bit cleaner:

import pandas as pd

pd.testing.assert_frame_equal(my_df, expected_df)
pd.testing.assert_series_equal(my_series, expected_series)
pd.testing.assert_index_equal(my_index, expected_index)

Each of these functions will raise AssertionError if they are not "equal".

For more information and options: https://pandas.pydata.org/pandas-docs/stable/reference/general_utility_functions.html#testing-functions

Steven
  • 458
  • 5
  • 9
-1

You could do something like this as well with snapshottest.

https://stackoverflow.com/a/64070787/3384609

Clintm
  • 4,505
  • 3
  • 41
  • 54