0

I have 2 csv files and i need to compare them using by pandas. The values in these two files are the same so I expect the df result to be empty but it shows to me they are different. Do you think i miss something when i read csv files? or another things to test/fix?

df1=pd.read_csv('apc2019.csv', sep = '|', lineterminator=True)
df2=pd.read_csv('apc2020.csv', sep = '|', lineterminator=True)
df = pd.concat([df1,df2]).drop_duplicates(keep=False)
print(df)

1 Answers1

1

I'd recommend to find what's the difference first, but it is hard with the pd.equals since it will only give you either True or False, can you try this?

from pandas._testing import assert_frame_equal

assert_frame_equal(df1, df2) 

This will tell you exactly the difference, and it has different levels of 'tolerance' (for example if you don't care about the column names, of the types etc)

Details here

If you want to compare with a tolerance in values:

In [20]: from pandas._testing import assert_frame_equal 
    ...: df1 = pd.DataFrame({'a': [1, 2], 'b': [3, 4], 'c': [1, 9]}) 
    ...: df2 = pd.DataFrame({'a': [1, 2], 'b': [3, 5], 'c': [1.5, 8.5]})                                                                                                                                                                                                       

In [21]: assert_frame_equal(df1, df2, check_less_precise=-1, check_dtype=False)    

By defaut chekc_dtype is True, so it will raise an exception if you have floats vs ints.

The other parameter to change is the check_less_precise by using negatives you make the allowed error bigger

  • first of all thank you so much! it helps a lot to see whats wrong. Now i am facing another problem; I round values to compare and for those kind of values: -14209.496 & -14209.50 it rounds to -14210.0 & -14209.0 that's why it couldnt match them :( Would you give me any commands how to fix it? Thank you – FATMA BERRAK GUNGOR May 18 '20 at 20:41