I'm working on this dateset:
https://www.kaggle.com/ronitf/heart-disease-uci?select=heart.csv
I'm viewing the results of pandas profiling
and it suggests that age
column has HIGH CORRELATION
with thalach
column.
I checked the 3 types of correlation between those fields:
print(f"pearson = ",df['age'].corr(df['thalach'], method='pearson'))
print(f"spearman = ",df['age'].corr(df['thalach'], method='spearman'))
print(f"kendall = ",df['age'].corr(df['thalach'], method='kendall'))
And I'm getting:
pearson = -0.39852193812106734
spearman = -0.3980524371044455
kendall = -0.28000884141748783
The 3 types of correlation shows lower correlation.
What am I missing ? Is there a way pandas profiling is wrong ?