Different methods to test normality result in different outputs for the same data

Question

I have data set from the internet and I wanted to try different normal tests for different columns. I find it funny, that different normality tests give me different results. Not just a couple of decimals different but COMPLETELY different outputs.

Here is my code.

from pandas import read_csv
url = "https://raw.githubusercontent.com/rashida048/Datasets/master/cars.csv"
data = read_csv(url)
y_1 = 'HWY (Le/100 km)' 
y_2 = 'HWY (kWh/100 km)' 
y_3 = 'CITY (kWh/100 km)' 
y_4 = '(km)'
m = data[y_1]
m_2 = data[y_2]
m_3 = data[y_3]
m_4 = data[y_4]
l = [m,m_2, m_3, m_4]
#Kolmogorov-Smirnov test for Normality
for i in l: 
    statistic, pvalue = stats.kstest(i, 'norm')
    print('statistic = %.2f, p = %.1f' %(statistic, pvalue))
    if pvalue > 0.05:
        print ('Gaussian')
    else:
        print('Not Gaussian')

Output:

statistic = 0.98, p = 0.0
Not Gaussian
statistic = 1.00, p = 0.0
Not Gaussian
statistic = 1.00, p = 0.0
Not Gaussian
statistic = 1.00, p = 0.0
Not Gaussian
#NormalTest (D'agostino's)

for i in l:
    statistic, pvalue = stats.normaltest(i)
    print('statistic = %.2f, p = %.5f' %(statistic, pvalue))
    if pvalue > 0.05:
        print ('Gaussian')
    else:
        print('Not Gaussian')
output:
statistic = 3.12, p = 0.21050
Gaussian
statistic = 3.28, p = 0.19423
Gaussian
statistic = 70.15, p = 0.00000
Not Gaussian
statistic = 188.31, p = 0.00000
Not Gaussian

#chi-Square
for i in l:
    statistic, pvalue = stats.chisquare(i)
    print('statistic = %.2f, p = %.5f' %(statistic, pvalue))
    if pvalue > 0.05:
        print ('Gaussian')
    else:
        print('Not Gaussian')

output: 
statistic = 0.44, p = 1.00000
Gaussian
statistic = 3.73, p = 1.00000
Gaussian
statistic = 23.84, p = 0.99972
Gaussian
statistic = 4348.68, p = 0.00000
Not Gaussian

I am still learning the data science and everything behind it. But I am confused, how to make a statement with different values. Is it just about picking one method and stick with it? That can't be it can it?

It's normal. Why different methods would give the same results? — Maciej M, Jan 18 '21 at 13:17
Okay I understand. Different methods, different procedure, different outcomes. But my question was who do I know which one to use. Should I just perform all and say "I like this outcome, I will use that". That seems highly unreliable. — Noob Programmer, Jan 18 '21 at 13:21
This is what data scientists do - pick up the best-performing method for a given problem. If data would be different then another method may perform better. — Maciej M, Jan 18 '21 at 13:58

Different methods to test normality result in different outputs for the same data

0 Answers0