Trying to do a Kruskall Wallis post hoc test in python but stats are different?

Question

I'm struggling to figure this. I'm new to python coming from an SPSS background. Essentially once you've done a Kruskal Wallis test and it returns a low p-value, the correct procedure is to do a post hoc Dunn test. I've been struggling to figure out the math but I found this article (https://journals.sagepub.com/doi/pdf/10.1177/1536867X1501500117), which I think lays it all out.

Python doesn't seem to have a Dunn test aside from figuring out the P-Value but I want to have a similar output to a pairwise comparison test that you can get in SPSS. This includes the z-stat/test statistic, standard deviation, standard deviation error,p-value and adjusted p-value using Bonferroni.

Right now I'm just working on getting the test statistic right so I can do the rest. My data is multiple groups which I've split into multiple data frames. My data, as an example, looks like this:

df1 | Factor 1 | Factor 2 | | -------- | -------- | | 3.45 | 8.95 | | 5.69 | 2.35 | row_total=31 df2 | Factor 1 | Factor 2 | | -------- | -------- | | 5.45 | 7.95 | | 4.69 | 5.35 | row_total=75 etc,etc

So essentially I'm trying to test df1["Factor1"] and df2["Factor1]. What I currently have is:

 def dunn_test(df1,df2,colname):
    ##Equation is z= yi/oi
    ##Where yi is the mean rankings of the two groups
    ## oi is the standard deviation of yi

    #Data Needed
    x=df1[colname]
    z=df2[colname]

    grouped = pd.concat([x,z])
    N =len(grouped)

    #calculating the Mean Rank of the Two Groups
    rank1= stats.rankdata(x)
    rank2=stats.rankdata(z)
    Wa = rank1.sum()/len(x)
    Wb = rank2.sum()/len(z)

    #yi
    y= Wa-Wb
    
    #standard deviation of yi
    #tied Ranks
    ranks= stats.rankdata(grouped)
    
    tied=pd.DataFrame([Counter(ranks)]).T
    tied= tied.reset_index()
    tied = tied.rename(columns={"index":"ranks",0:'ties'})
    count_ties = tied[tied.ties >=2].count()


    #standard Deviaton formula
    t= tied["ties"]
    for tied in t:
        e = t**3-t
        e = [i for i in e if i != 0]
    
    oi=((N*(N+1)/2) - sum(e)/12*(N-1))*(1/len(x) + 1/len(z))
    
    zstat=y/oi
    
    return zstat

It outputs 0.0630. The issue I'm having is that when I run the same test through SPSS, the number is -51.422. I'm not sure I'm doing it right, have the right equation or what I'm meant to do.

Any help would be appreciated.

score 0 · Answer 1 · answered Jul 19 '21 at 08:15

I had to do something similar. The code below should work for you. It performs the Kruskal-Wallis test along with the Dunn's Test. The p values on the Dunn's test use a Bonferroni correction. The data needs to be structured in a single column, with some stratifying indicators included. post_hoc_result_dict returns the variable name, z-score, the p-value, and the corrected p-value in that order. The code below should work for you as is. lmk.

FUNCTION CALL:

f1 = df1['Factor 1'].to_frame(name='value')
f1['factor'] = 'Factor 1'
f2 = df1['Factor 1'].to_frame(name='value')
f2['factor'] = 'Factor 2'
correct_format = pd.concat([f1,f2])
k,p,post_hoc_result_dict = kw_test(correct_format,'factor','value')

FUNCTIONS:

def p_rounder(p_value):
    if p_value < .0001:
        p_value = '<.0001'
    else:
        p_value = str((round(p_value,4)))
    return p_value

def bon_correct(p_value,k):
    corrected_p = p_value * ((k *(k-1))/2)
    return p_value, corrected_p

def kw_dunn_post_hoc(df,strat,comp_list, var):
    post_hoc_result_dict = {}
    N = df['rank'].count()
    n_groups = df[strat].nunique()
    for comp in comp_list:
        m1 = df.loc[df[strat] == comp[0]]['rank'].mean()
        n1 = df.loc[df[strat] == comp[0]]['rank'].count()
        m2 = df.loc[df[strat] == comp[1]]['rank'].mean()
        n2 = df.loc[df[strat] == comp[1]]['rank'].count()
        Z = (m1 - m2)/sqrt(((N*(N+1))/12)*((1/n1)+(1/n2)))
        Z = round(Z,4)
        p = stats.norm.sf(abs(Z))
        p, corrected_p = bon_correct(p,n_groups)
        p = p_rounder(p)
        corrected_p = p_rounder(corrected_p)
        comparison = f'{comp[0]} vs. {comp[1]}'
        post_hoc_result_dict[comparison] = [var,Z,p,corrected_p]
    return post_hoc_result_dict

def kw_test(df,stratifier,var):
    import sys
    from math import sqrt
    result_list = []
    strat_list = []
    comparison_list = []
    counter = 0
    temp_df = df[[stratifier,var]].copy()
    temp_df['rank'] = temp_df[var].rank(method='average')
    for strat in df[stratifier].unique():
        result = df.loc[df[stratifier] == strat][var].values
        result_list.append(result)
        strat_list.append(strat)
    for st in strat_list:
        for st2 in strat_list:
            if st != st2 and [st2,st] not in comparison_list:
                comparison_list.append([st,st2])
    post_hoc_result_dict = kw_dunn_post_hoc(temp_df,stratifier,comparison_list,var)
    if len(result_list) == 2:
        k,p = stats.kruskal(result_list[0],result_list[1])
    if len(result_list) == 3:
        k,p = stats.kruskal(result_list[0],result_list[1],result_list[2])
    elif len(result_list) == 4:
        k,p = stats.kruskal(result_list[0],result_list[1],result_list[2],result_list[3])
    elif len(result_list) == 5:
        k,p = stats.kruskal(result_list[0],result_list[1],result_list[2],result_list[3],result_list[4])
    else:
        print('Stratifying levels greater than 5. Please modify code to accomodate.')
        sys.exit()
    k = round(k,4)    
    p = p_rounder(p)
    return k, p, post_hoc_result_dict

Hi. Thank you for the response. I will add that for your if-else statements you can simplify it to one line: k, p = stats.kruskal(*result_list) Then you don't have to run the error message. Using this code I'm still facing the same issue (SPSS outputs the Z statistic as -51.422 but python gives me -3.4559). I think my issue is that I'm not sure how the formula in SPSS works and why I'm getting such different results on the same dataset. I get the same k-statistic and significance but the pairwise comparison is just completely different no matter what I do. — tankgirl2810, Jul 20 '21 at 23:15
You know I should sleep more before I answer these questions. I repeated my analysis in SPSS (ver 27) and got the same result as what my Python code gave me. The "Std. Test Statistic" (I'm assuming std. = standardized?) matched to three decimal places. If you divide the SPSS Z-statistic by the standard error do you get the same value as in Python? — Clovis, Jul 21 '21 at 04:58
I think Std. Test Statistics is the standard deviation of the Z statistics but still your method of the Test Statistics/Std.Error gave me the Std. Test Statistics but it's still off but by less (difference between -3.4559 and 3.1025). — tankgirl2810, Jul 21 '21 at 11:52
The standard deviation for a Z statistic is always 1 with a mean of 0. Are the same number of observations used in both software packages? That seems close enough that there might some small unaccounted for difference. Would you be able to share the all of the observations for those two variables? I can try and run it too to see if I get the same difference. — Clovis, Jul 21 '21 at 15:52

Trying to do a Kruskall Wallis post hoc test in python but stats are different?

1 Answers1