0

I am new to matplotlib and statistics. Trying to learn through the below example and need some help in terms of understanding and solution.

I have added a bar chart image below. I have sample data for four years 1992, 1993, 1994, and 1995. I have plotted 4 bars for their Mean and Margin of Error. Further, I am allowing the user to draw a rectangle to select a range on the y-axis. This is shown as the grey horizontal rectangle in the image with ymax=46132 and ymin=37527. Now the task is to compare each bar with this y-axis range and evaluate if the probability of each distribution’s value falling within the selected range on y-axis and accordingly colour the bar based on the colour map at the bottom.

enter image description here

I have used the following code to find the probability for this but it's not showing the correct results. df2 has 4 rows containing the mean and standard deviation for each bar. ymax=46132 and ymin=37527.

import pandas as pd
import matplotlib.cm as cm
import scipy.stats as st

cmap = cm.get_cmap('Reds')
df2 = pd.DataFrame(data=[[33312.107476, 200630.901553, 6508.897970],
                         [41861.859541, 98398.356203, 3192.254314],
                         [39493.304941, 140369.925240, 4553.902287],
                         [47743.550969, 69781.185469, 2263.851744]],
                   columns=['mean', 'std', 'MoE'],
                   index=['1992', '1993', '1994', '1995'])
ymax = 46132
ymin = 37527

for i in range(len(df2)):
    cdf_value = (st.norm(df2.iloc[i]['mean'], df2.iloc[i]['std']).cdf(ymax) - 
                 st.norm(df2.iloc[i]['mean'], df2.iloc[i]['std']).cdf(ymin))
    print(cdf_value)
    clr_shade = cmap(cdf_value)

Below is the output cdf values. All are close to 0 and hence cmap faching the light colour for all bars. As per my understanding, with the current y-axis range in the image, bar for the 1993 should plot with dark colour (should have higher probability), 1992 and 1995 with light colour (with lower probability) and 1994 may be with in-between colour.

0.017093796658858795
0.03487664518128952
0.024448867322311274
0.048988004652986805

Please help me to understand what am I doing wrong and how to solve this.

Mr. T
  • 11,960
  • 10
  • 32
  • 54
Vikrant
  • 36
  • 2
  • Matplotlib is a plotting library. What do you use for your statistical calculations? Please add the appropriate tag. – Mr. T Jan 27 '21 at 11:32
  • Please add full code, so other people will be able to reproduce it and try to find a solution – maria Jan 27 '21 at 14:59
  • Added more code and tag. – Vikrant Jan 28 '21 at 07:28
  • Are you sure of those values? A normal distribution with standard deviation greater than the mean? Ok, but many values would be less than zero. Also... what do you mean with "Margin of Error"? Let me guess that what you call MoE actually IS the standard deviation. – Max Pierini Mar 27 '21 at 14:48

0 Answers0