I'm using Hypothesis to test dataframes, and when they're "empty-ish" I'm getting some unexpected behavior.
In the example below, I have a dataframe of all nans, and it's getting viewed as a NoneType
object rather than a dataframe (and thus it has no attribute notnull()
):
Falsifying example: test_merge_csvs_properties(input_df_dict= {'googletrend.csv': file week trend
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN 5 NaN NaN NaN}
<snip>
Traceback (most recent call last):
File "/home/chachi/Capstone-SalesForecasting/tests/test_make_dataset_with_composite.py", line 285, in test_merge_csvs_properties
input_dataframe, df_dict = make_dataset.merge_csvs(input_df_dict)
File "/home/chachi/Capstone-SalesForecasting/tests/../src/data/make_dataset.py", line 238, in merge_csvs
if dfs_dict['googletrend.csv'].notnull().any().any():
AttributeError: 'NoneType' object has no attribute 'notnull'
Compare to ipython session, where a dataframe of all nans is still a dataframe:
>>> import pandas as pd
>>> import numpy as np
>>> tester = pd.DataFrame({'test': [np.NaN]})
>>> tester
test
0 NaN
>>> tester.notnull().any().any()
False
I'm explicitly testing for notnull()
to allow for all sorts of pathological examples. Any suggestions?