I am building a Monte Carlo simulation in order to study the behaviour of a set of 1000 iterations. Every simulation has an output graph given by a Pandas dataframe converted into a png by matplotlib.pyplot
. Since I am not sure that every output is a Normal ditribution, even if a read an article about this and it secures every output is, I'd like to understand how to check it.
I've found something in this link but I didn't understand which one is the best and how to implement it.
Here's the code:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
sns.set_style('whitegrid')
avg = 1
std_dev = .1
num_reps = 500
num_simulations = 1000
#generate a list of percentages that will replicate our historical normsal distribution
#two decimal places in order to make it very easy to see the boundaries
pct_to_target = np.random.normal(avg, std_dev, num_reps).round(2)
#input of historical datas
sales_target_values = [75_000, 100_000, 200_000, 300_000, 400_000, 500_000]
sales_target_prob = [.3, .3, .2, .1, .05, .05]
sales_target = np.random.choice(sales_target_values, num_reps, p=sales_target_prob)
#build up a pandas dataframe
df = pd.DataFrame(index=range(num_reps), data={'Pct_To_Target': pct_to_target,
'Sales_Target': sales_target})
df['Sales'] = df['Pct_To_Target'] * df['Sales_Target']
#Here is what our new dataframe looks like
print("how our dataframe looks like")
print(df)
#Return the commission rate based on the excell table
def calc_commission_rate(x):
if x <= .90:
return .02
if x <= .99:
return .03
else:
return .04
#create our commission rate and multiply it times sales
df['Commission_Rate'] = df['Pct_To_Target'].apply(calc_commission_rate)
df['Commission_Amount'] = df['Commission_Rate'] * df['Sales']
print(df)
# Define a list to keep all the results from each simulation that we want to analyze
all_stats = []
# Loop through many simulations
for i in range(num_simulations):
# Choose random inputs for the sales targets and percent to target
sales_target = np.random.choice(sales_target_values, num_reps, p=sales_target_prob)
pct_to_target = np.random.normal(avg, std_dev, num_reps).round(2)
# Build the dataframe based on the inputs and number of reps
df = pd.DataFrame(index=range(num_reps), data={'Pct_To_Target': pct_to_target,
'Sales_Target': sales_target})
# Back into the sales number using the percent to target rate
df['Sales'] = df['Pct_To_Target'] * df['Sales_Target']
# Determine the commissions rate and calculate it
df['Commission_Rate'] = df['Pct_To_Target'].apply(calc_commission_rate)
df['Commission_Amount'] = df['Commission_Rate'] * df['Sales']
#print(df)
# We want to track sales,commission amounts and sales targets over all the simulations
all_stats.append([df['Sales'].sum().round(0),
df['Commission_Amount'].sum().round(0),
df['Sales_Target'].sum().round(0)])
results_df = pd.DataFrame.from_records(all_stats, columns=['Sales',
'Commission_Amount',
'Sales_Target'])
results_df.describe().style.format('{:,}')
print(results_df)
results_df['Commission_Amount'].plot(kind='hist', title="Total Commission Amount")
plt.savefig('graph.png')
# results_df['Sales'].plot(kind='hist')
# plt.savefig('graph2.png')
print(results_df)
I'd like to add a function that checks if the output distribution is a Gaussian (normal) distribution , because I am not sure that it actually is at every running.