How to implement a normality-check function in python?

Question

I am building a Monte Carlo simulation in order to study the behaviour of a set of 1000 iterations. Every simulation has an output graph given by a Pandas dataframe converted into a png by matplotlib.pyplot. Since I am not sure that every output is a Normal ditribution, even if a read an article about this and it secures every output is, I'd like to understand how to check it.

I've found something in this link but I didn't understand which one is the best and how to implement it.

Here's the code:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

sns.set_style('whitegrid')

avg = 1
std_dev = .1
num_reps = 500
num_simulations = 1000

#generate a list of percentages that will replicate our historical normsal distribution
#two decimal places in order to make it very easy to see the boundaries
pct_to_target = np.random.normal(avg, std_dev, num_reps).round(2)


#input of historical datas 
sales_target_values = [75_000, 100_000, 200_000, 300_000, 400_000, 500_000]
sales_target_prob = [.3, .3, .2, .1, .05, .05]
sales_target = np.random.choice(sales_target_values, num_reps, p=sales_target_prob)


#build up a pandas dataframe
df = pd.DataFrame(index=range(num_reps), data={'Pct_To_Target': pct_to_target,
                                               'Sales_Target': sales_target})

df['Sales'] = df['Pct_To_Target'] * df['Sales_Target']

#Here is what our new dataframe looks like
print("how our dataframe looks like")
print(df)

#Return the commission rate based on the excell table
def calc_commission_rate(x):
    if x <= .90:
        return .02
    if x <= .99:
        return .03
    else:
        return .04

#create our commission rate and multiply it times sales
df['Commission_Rate'] = df['Pct_To_Target'].apply(calc_commission_rate)
df['Commission_Amount'] = df['Commission_Rate'] * df['Sales']
print(df)

# Define a list to keep all the results from each simulation that we want to analyze
all_stats = []

# Loop through many simulations
for i in range(num_simulations):

    # Choose random inputs for the sales targets and percent to target
    sales_target = np.random.choice(sales_target_values, num_reps, p=sales_target_prob)
    pct_to_target = np.random.normal(avg, std_dev, num_reps).round(2)

    # Build the dataframe based on the inputs and number of reps
    df = pd.DataFrame(index=range(num_reps), data={'Pct_To_Target': pct_to_target,
                                                   'Sales_Target': sales_target})

    # Back into the sales number using the percent to target rate
    df['Sales'] = df['Pct_To_Target'] * df['Sales_Target']

    # Determine the commissions rate and calculate it
    df['Commission_Rate'] = df['Pct_To_Target'].apply(calc_commission_rate)
    df['Commission_Amount'] = df['Commission_Rate'] * df['Sales']
    #print(df)

    # We want to track sales,commission amounts and sales targets over all the simulations
    all_stats.append([df['Sales'].sum().round(0),
                      df['Commission_Amount'].sum().round(0),
                      df['Sales_Target'].sum().round(0)])

results_df = pd.DataFrame.from_records(all_stats, columns=['Sales',
                                                           'Commission_Amount',
                                                           'Sales_Target'])
results_df.describe().style.format('{:,}')
print(results_df)
results_df['Commission_Amount'].plot(kind='hist', title="Total Commission Amount")

plt.savefig('graph.png')
# results_df['Sales'].plot(kind='hist')
# plt.savefig('graph2.png')
print(results_df)

I'd like to add a function that checks if the output distribution is a Gaussian (normal) distribution , because I am not sure that it actually is at every running.

For a specific answer, it would be helpful to know which data you want to test for normality (is it `sales_target` or `pct_to_target` or a column in `df`?), but the simplest place to start is scipy.stats.normaltest: [docs](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.normaltest.html) and a related [SO question](https://stackoverflow.com/questions/12838993/scipy-normaltest-how-is-it-used) — chris, Sep 06 '19 at 21:26
@chris Thank you for your answer. I just want to know if the total commission amount graph is a gaussian, meaning the result i get from results_df['Commission_Amount'] column. — Harrison W., Sep 12 '19 at 10:42
Well if you import scipy, just run the normal test on your column. `k2, p = scipy.stats.normaltest(results_df['Commission_Amount'])` will return `k2`, a measure of both skewness and kurtosis, `p` is the p-value. If `p< alpha` then you have a normal distribution. `alpha` is a small number, e.g., 10^-3. — chris, Sep 12 '19 at 17:30

How to implement a normality-check function in python?

0 Answers0