How to create a function to test normality of each variable

Question

I am trying to build a function that iteratively returns i) JarqueBera test stat, ii) JarqueBera pvalue, iii) the slope, intercept and determination coeff of the probplot, and iv) the probplot itself. All is intended to be returned for a single variable at a time.

def normality(c):
    JB_test_stat = ss.jarque_bera(c)[0]
    JB_pval = ss.jarque_bera(c)[1]
    probplot_slope = ss.probplot(c, plot = plt)[1][0]
    probplot_interc = ss.probplot(c, plot = plt)[1][1]
    probplot_r = ss.probplot(c, plot = plt)[1][2]
    return(print("Skewness:",c.skew(),"\nExcess kurtosis:",c.kurt(),"\nJarque-Bera stat:",JB_test_stat," pvalue:", JB_pval,"\nSlope:",probplot_slope,"Intercept:",probplot_interc, "r:",probplot_r,"\n"))

Unfortunately, when I call the function on my dataframe[numeric_cols], being numeric_cols a list,

for c in numeric_cols:    
    normality(df[c])

I get all the numeric results in the return statement correctly, but at the bottom a single probplot with all variables plotted in a messy way, whereas what I expect is to get the numerical results for each variable along with its corresponding probplot.

Skewness: 0.1004187952160102 Excess kurtosis: -0.543819517693596 Jarque-Bera stat: 7.593972235734294 pvalue: 0.022438296430201454 Slope: 4.3135147782152465 Intercept: 25.5 r: 0.9947611456706487

Skewness: -0.1560130144763728 Excess kurtosis: -1.2824901951466612 Jarque-Bera stat: 38.56183464454786 pvalue: 4.23061985443951e-09 Slope: 11.492550446207257 Intercept: 19.535714285714285 r: 0.9668502992894236

Skewness: 0.2347601433103727 Excess kurtosis: -1.242639192300385 Jarque-Bera stat: 39.0662449724179 pvalue: 3.287552452491127e-09 Slope: 11.545683807955731 Intercept: 15.714285714285714 r: 0.9647448407831439

Skewness: 0.24353437856100904 Excess kurtosis: -1.1969521906230485 Jarque-Bera stat: 36.98912338336009 pvalue: 9.287822622106034e-09 Slope: 1013.985374629207 Intercept: 1411.4436090225563 r: 0.9682492605786011

Skewness: 2.837876986150242 Excess kurtosis: 9.516628330654008 Jarque-Bera stat: 2675.4455000782764 pvalue: 0.0 Slope: 2.6057664781688454 Intercept: 1.8533834586466167 r: 0.7776054895177505

Skewness: 2.406153102778617 Excess kurtosis: 7.002529753885085 Jarque-Bera stat: 1573.6596724989513 pvalue: 0.0 Slope: 1.714847443415902 Intercept: 1.287593984962406 r: 0.8152919114915671

Skewness: 0.9337529310147361 Excess kurtosis: 0.45862734243889847 Jarque-Bera stat: 81.22389376608798 pvalue: 0.0 Slope: 605.3354149443196 Intercept: 717.75 r: 0.9550404156079808

Skewness: -3.030640857636996 Excess kurtosis: 15.686541621050898 Jarque-Bera stat: 6154.761075129672 pvalue: 0.0 Slope: 11.37955609488042 Intercept: 77.82387218045113 r: 0.8711740556551902

Skewness: 6.398317104228115 Excess kurtosis: 49.10097819497357 Jarque-Bera stat: 56029.69126113364 pvalue: 0.0 Slope: 0.41431397013222515 Intercept: 0.1917293233082707 r: 0.48503363895959983

Skewness: 6.204252341215679 Excess kurtosis: 47.28662289867727 Jarque-Bera stat: 52010.755388690835 pvalue: 0.0 Slope: 0.4947086253584861 Intercept: 0.23496240601503762 r: 0.5050004904368586

Skewness: 2.06633193738682 Excess kurtosis: 5.770784034742405 Jarque-Bera stat: 1098.0175308306793 pvalue: 0.0 Slope: 0.12821997057404685 Intercept: 0.11328947368421052 r: 0.8619773533976459

Skewness: 2.9189857433086495 Excess kurtosis: 16.837230233306762 Jarque-Bera stat: 6909.724155123523 pvalue: 0.0 Slope: 0.07805612907589729 Intercept: 0.07265037593984962 r: 0.8632361803763113

Skewness: 1.2633082232077495 Excess kurtosis: 1.5265390704578943 Jarque-Bera stat: 190.6495836394772 pvalue: 0.0 Slope: 2.09821120102269 Intercept: 2.1146616541353382 r: 0.9211028014650718

Skewness: 3.091346622737553 Excess kurtosis: 8.530683362863476 Jarque-Bera stat: 2421.371001114453 pvalue: 0.0 Slope: 0.16657862407594715 Intercept: 0.09022556390977444 r: 0.5658043763386988

How could fix it? Thank you all in advance

Why would you return a `print` statement?? Also, please add your expected output and an example of the current output. — ShlomiF, Jul 07 '21 at 10:20
What about now man? I still don't know pretty much how to use the posting interface — Mario Aguilar, Jul 07 '21 at 10:30
You mean you want a separate figure for each? Or something else? The question is still not very clear. — ShlomiF, Jul 07 '21 at 13:42

score 0 · Accepted Answer · answered Jul 07 '21 at 15:34

Just add a plt.figure() in your function, so that every call to the function opens a new figure.
On a completely different note, using return(print('stuff')) is superfluous. If you really want to print the results then just use print with no return.
It would be more pythonic and generally better practice to return the values you're currently printing, and then print them externally:

def normality(c):
    JB_test_stat = ss.jarque_bera(c)[0]
    JB_pval = ss.jarque_bera(c)[1]
    probplot_slope = ss.probplot(c, plot = plt)[1][0]
    probplot_interc = ss.probplot(c, plot = plt)[1][1]
    probplot_r = ss.probplot(c, plot = plt)[1][2]
    return c.skew(), c.kurt(), JB_test_stat, JB_pval, probplot_slope, probplot_interc, probplot_r


for c in numeric_cols:    
    c.skew(), c.kurt(), JB_test_stat, JB_pval, probplot_slope, probplot_interc, probplot_r = normality(df[c])
    print("Skewness:",c.skew(),
          "\nExcess kurtosis:",c.kurt(),
          "\nJarque-Bera stat:",JB_test_stat,
          " pvalue:", B_pval,
          "\nSlope:",probplot_slope,
          "Intercept:",probplot_interc, 
          "r:",probplot_r,"\n")

How to create a function to test normality of each variable

1 Answers1