0
  • Is there a way I can get a size frequency histogram for a population under different scenarios for specific days in python
    • means with error bars
  • My data are in a format similar to this table:
SCENARIO     RUN     MEAN     DAY
A             1       25       10
A             1       15       30
A             2       20       10
A             2       27       30
B             1       45       10
B             1       50       30
B             2       43       10
B             2       35       30
  • results_data.groupby(['Scenario', 'Run']).mean() does not give me the days I want to visualize the data by
    • it returns the mean on the days in each run.
Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158
phango
  • 5
  • 3

1 Answers1

3

Use seaborn.FacetGrid

  • FactGrid is a Multi-plot grid for plotting conditional relationships
  • Map seaborn.distplot onto the FacetGrid and use hue=DAY.

Setup Data and DataFrame

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import random  # just for test data
import numpy as np  # just for test data


# data
random.seed(365)
np.random.seed(365)
data = {'MEAN': [np.random.randint(20, 51) for _ in range(500)],
        'SCENARIO': [random.choice(['A', 'B']) for _ in range(500)],
        'DAY': [random.choice([10, 30]) for _ in range(500)],
        'RUN': [random.choice([1, 2]) for _ in range(500)]}

# create dataframe
df = pd.DataFrame(data)

Plot with kde=False

g = sns.FacetGrid(df, col='RUN', row='SCENARIO', hue='DAY', height=5)
g = g.map(sns.distplot, 'MEAN', bins=range(20, 51, 5), kde=False, hist_kws=dict(edgecolor="k", linewidth=1)).add_legend()
plt.show()

enter image description here

Plot with kde=True

g = sns.FacetGrid(df, col='RUN', row='SCENARIO', hue='DAY', height=5, palette='GnBu')
g = g.map(sns.distplot, 'MEAN', bins=range(20, 51, 5), kde=True, hist_kws=dict(edgecolor="k", linewidth=1)).add_legend()
plt.show()

enter image description here

Plots with error bars

from itertools import product

# create unique combinations for filtering df
scenarios = df.SCENARIO.unique()
runs = df.RUN.unique()
days = df.DAY.unique()
combo_list = [scenarios, runs, days]
results = list(product(*combo_list))  

# plot
for i, result in enumerate(results, 1):  # iterate through each set of combinations
    s, r, d = result
    data = df[(df.SCENARIO == s) & (df.RUN == r) & (df.DAY == d)]  # filter dataframe
    
    # add subplot rows, columns; needs to equal the number of combinations in results
    plt.subplot(2, 4, i)
    
    # plot hist and unpack values
    n, bins, _ = plt.hist(x='MEAN', bins=range(20, 51, 5), data=data, color='g')
    
    # calculate bin centers
    bin_centers = 0.5 * (bins[:-1] + bins[1:])
    
    # draw errobars, use the sqrt error. You can use what you want there
    # poissonian 1 sigma intervals would make more sense
    plt.errorbar(bin_centers, n, yerr=np.sqrt(n), fmt='k.')


    plt.title(f'Scenario: {s} | Run: {r} | Day: {d}')
plt.tight_layout()
plt.show()

enter image description here

Trenton McKinney
  • 56,955
  • 33
  • 144
  • 158