Generating CDF Graphs using Seaborn

Question

I am trying to plot a CDF graph for my code using Seaborn but can't get it to work.

Specifically, I want to generate CDF graphs for sum_MDA, sum_CLA, sum_BIA and grand_total after I have simulated the entire code 1000 times. My code is as follows (apologies in advance for the length).

def sim():

    df['RAND'] = np.random.uniform(0,1, size=df.index.size)
    dfRAND = list(df['RAND'])

    def L():
        result = []
        conditions = [df.RAND >= (1 - 0.8062), (df.RAND < (1 - 0.8062)) & (df.RAND >= 0.1),
                              (df.RAND < 0.1) & (df.RAND >= 0.05), (df.RAND < 0.05) & 
                              (df.RAND >= 0.025), (df.RAND < 0.025) & (df.RAND >= 0.0125), 
                              (df.RAND < 0.0125)]
        choices = ['L0', 'L1', 'L2', 'L3', 'L4', 'L5']
        df['L'] = np.select(conditions, choices)
        result = df['L'].values
        return result
    L()
    #print(L())
    #print(df.pivot_table(index='L', aggfunc=len, fill_value=0))

    def MD():
        result = []
        conditions = [L() == 'L0', L() == 'L1', L() == 'L2', L() == 'L3', 
                  L() == 'L4', L() == 'L5']
        choices = [(df['P_MD'].apply(lambda x: x * 0.02)), (df['P_MD'].apply(lambda x: x * 0.15)),
               (df['P_MD'].apply(lambda x: x * 0.20)), (df['P_MD'].apply(lambda x: x * 0.50)),
               (df['P_MD'].apply(lambda x: x * 1.0)), (df['P_MD'].apply(lambda x: x * 1.0))]
        df['MDL'] = np.select(conditions, choices)
        #result = print(df['MDL'].values)
        return result
    MD()

    def CL():
        result = []
        conditions = [L() == 'L0', L() == 'L1', L() == 'L2', L() == 'L3', L() == 'L4', 
                  L() == 'L5']
        choices = [1600, 3200, 9600, 48000, 48000, 48000]
        df['CL'] = np.select(conditions, choices)
        #result = print(df['CL'].values)
        return result
    CL()

    def BI():
        result = []
        conditions = [L() == 'L0', L() == 'L1', L() == 'L2', L() == 'L3', 
                  L() == 'L4', L() == 'L5']
        choices = [(df['P_BI'].apply(lambda x: (x / 548) * 1)),
               (df['P_BI'].apply(lambda x: (x / 548) * 2)),
               (df['P_BI'].apply(lambda x: (x / 548) * 14)),
               (df['P_BI'].apply(lambda x: (x / 548) * 60)),
               (df['P_BI'].apply(lambda x: (x / 548) * 180)),
               (df['P_BI'].apply(lambda x: (x / 548) * 365))]
        df['BIL'] = np.select(conditions, choices)
        #result = print(df['BIL'].values)
        return result
    BI()

    sum_MDA = int(np.sum(df['MDL']))
    sum_CLA = int(np.sum(df['CL']))
    sum_BIA = int(np.sum(df['BIL']))
    grand_total = int(sum_MDA + sum_CLA + sum_BIA)

    result = sum_MDA, sum_CLA, sum_BIA, grand_total
    return result

sim()

for i in range(1000):
    print(sim())

#sns.distplot(sim(), bins=100,
     #kde_kws=dict(cumulative=True), axlabel='(£)',  color='purple', 
     #).set_title('Simulation (N=1000)')

Any help is appreciated. Thanks a lot.

All of the functions defined inside the function `sim` return DataFrames, but you aren't saving any of those returned DataFrames; you call each function right after each definition, and they perform as expected, but the result is thrown away. — Peter Leimbigler, Oct 22 '18 at 16:56

score 2 · Answer 1 · answered Nov 25 '20 at 17:47

2

This was added as sns.ecdfplot in version 0.11.0! https://seaborn.pydata.org/generated/seaborn.ecdfplot.html

answered Nov 25 '20 at 17:47

Alonso Martinez

21
3

score 0 · Answer 2 · answered Dec 05 '18 at 15:00

As stated above, you are passing the whole dataframe to Seaborn. You want to pass a specific column, like sim['MDL'].

See an example on my question here.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

df1 = pd.DataFrame({'A':np.random.randint(0, 100, 1000)})    

f, ax = plt.subplots(figsize=(8, 8))
ax = sns.kdeplot(df1['A'], cumulative=True)

plt.show()

Generating CDF Graphs using Seaborn

2 Answers2