As answered well in previous posts, this is a floating point arithmetic issue common in programming languages. You should be aware never to apply exact equality to float
types.
When you have such comparisons, you can employ a function that compares based on a given tolerance (threshold). If the numbers are close enough, they should be considered equal number-wise. Something like:
def isequal_float(x1,x2, tol=10**(-8)):
"""Returns the results of floating point equality, according to a tolerance."""
return abs(x1 - x2)<tol
will do the trick. If I'm not mistaken, the exact tolerance depends on whether the float
type is single- or double-precision and this depends on the language you're using.
Using such a function allows you to easily compare the results of calculations, for instance in numpy
. Let's take the following example for instance, where the correlation matrix is calculated for a dataset with continuous variables, using two ways: the pandas
method pd.DataFrame.corr()
and the numpy
function np.corrcoef()
:
import numpy as np
import seaborn as sns
iris = sns.load_dataset('iris')
iris.drop('species', axis = 1, inplace=True)
# calculate correlation coefficient matrices using two different methods
cor1 = iris.corr().to_numpy()
cor2 = np.corrcoef(iris.transpose())
print(cor1)
print(cor2)
The results seem similar:
[[ 1. -0.11756978 0.87175378 0.81794113]
[-0.11756978 1. -0.4284401 -0.36612593]
[ 0.87175378 -0.4284401 1. 0.96286543]
[ 0.81794113 -0.36612593 0.96286543 1. ]]
[[ 1. -0.11756978 0.87175378 0.81794113]
[-0.11756978 1. -0.4284401 -0.36612593]
[ 0.87175378 -0.4284401 1. 0.96286543]
[ 0.81794113 -0.36612593 0.96286543 1. ]]
but the results of their exact equality are not. These operators:
print(cor1 == cor2)
print(np.equal(cor1, cor2))
will yield mostly False
results element-wise:
[[ True False False False]
[False False False False]
[False False False False]
[False False False True]]
Likewise, np.array_equal(cor1, cor2)
will also yield False
. However, the custom-made function gives the comparison you want:
out = [isequal_float(i,j) for i,j in zip(cor1.reshape(16, ), cor2.reshape(16, ))]
print(out)
[True, True, True, True, True, True, True, True, True, True, True, True, True, True, True, True]
Note: numpy
includes the .allclose()
function to perform floating point element-wise comparisons in numpy arrays.
print(np.allclose(cor1, cor2))
>>>True