Trying to generate a simple histogram using 1% bins and a simple normal distribution but I am getting incredibly small bin counts - where am I messing up the implementation of np.histogram?
Here is the basic implementation:
import streamlit as st
import math
import pandas as pd
import numpy as np
from numpy.random import normal
import random
import matplotlib.pyplot as plt
import plotly.graph_objects as go
mean = 600000
uncertainty = 5.02
st_dev = mean * uncertainty/100
year1_dist = normal(mean, st_dev, 10000)
bin_size = mean * 0.01
nbins = math.ceil((year1_dist.max() - year1_dist.min()) / bin_size)
hist, bin_edges = np.histogram(year1_dist, bins=nbins, density=True)
The values stored in hist are very small (sum to something like 0.00017) - I have also tried plotting the histogram using plotly with the following implementation which returns the same results (very low frequency or occurrence)
fig = go.Figure(data=[go.Histogram(x=year1_dist, nbinsx=nbins, name='Histogram')])
Ultimately, I would like to have a CDF overlaid on the histogram to resemble something like this though I know there will be some normalization involved on the histogram frequency and I need to reset my inputs a bit to have a mean at zero.
I have the CDF plotted and generated as expected and I am implementing the tool in streamlit. Here is the plotting section of my code which shows the CDF and Histogram (albeit with the histrogram values being very very low)
bin_size = mean * 0.01
nbins = math.ceil((year1_dist.max() - year1_dist.min()) / bin_size)
hist, bin_edges = np.histogram(year1_dist, bins=nbins, density=True)
cdf = np.cumsum(hist * np.diff(bin_edges))
fig = go.Figure(data=[
go.Histogram(x=year1_dist, nbinsx=nbins, name='Histogram'),
go.Scatter(x=bin_edges, y=cdf, name='CDF')
])
st.plotly_chart(fig, use_container_width=True)