0

I have created this CDF and added values to the bars. The absolut values seem to be correct. However, when I set density to True the %s are off:

fig, ax = plt.subplots()

values, x_pos, _ = plt.hist(x['tx'],cumulative=True, density=True, bins=np.arange(12)+ 0.7,color='dimgray',edgecolor='white',width=0.6)
plt.xticks(range(12))
plt.yticks(np.arange(0, 1.1, step=0.1))

[plt.text(x_-0.1, val+0.01, "{0:1.2f} %".format(val *100)) for x_, val in zip(x_pos, values)]


#plt.ylabel('Proportion of addresses with \n >0 outgoing transactions', fontsize=12, **font)
#plt.xlabel('Count of outgoing transactions', fontsize=12, **font)
#plt.rcParams["figure.figsize"]=(10, 4)


ax.set_axisbelow(True)
ax.yaxis.set_major_formatter(ticker.PercentFormatter(xmax=1))

labels = [item.get_text() for item in ax.get_xticklabels()]
labels[1] = '1'
labels[2] = '≤ 2'
labels[3] = '≤ 3'
labels[4] = '≤ 4'
labels[5] = '≤ 5'
labels[6] = '≤ 6'
labels[7] = '≤ 7'
labels[8] = '≤ 8'
labels[9] = '≤ 9'
labels[10] = '≤ 10'
labels[11] = '≤ max'


ax.set_xticklabels(labels)

ax.grid(linestyle='-', linewidth='0.4', color='silver')
fig.subplots_adjust(top=1.1)

plt.savefig('filename.svg', format='svg',bbox_inches='tight')

The total number of cases is 91,213,668 and for x = 1 there are 43,210,403 cases. Thus, the % on the first bar should be 47.37% and not 50.39%. %

If I set Density to False, the number seems to be correct:

enter image description here

Thus the plot seems to take another value than 91,213,668 to calculate the proportions. I have used the same code to plot another CDF but I could not find out where the mistake is.

truongvu3
  • 29
  • 4
  • Did you zoom in on the number written above the last column? It seems to end with `...300`, so it's not `91,213,668`. Maybe `x['tx']` contains numbers outside the range `1-11`? Or maybe even invalid values? – JohanC Mar 16 '21 at 11:41
  • @JohanC Thanks for your suggestion, it has indeed something to do with the range, I adjusted it and the % changed. But how do I only show these 11 bars and still have the right proportions? The dataframe only contains valid values, whole numbers. – truongvu3 Mar 16 '21 at 11:53
  • 1
    Well, your assumption that the dataframe only contains these 11 numbers, is clearly wrong. Maybe set `x['tx'] = np.where(x['tx'] > 11, 11, x['tx'])` or something similar? – JohanC Mar 16 '21 at 12:36
  • 1
    You can use `x['tx'].describe()` to find out things like minimum, maximum, count and type. – JohanC Mar 16 '21 at 12:46
  • @JohanC I struggled a bit in the beginning but I understand now that you change all the values bigger than 11 so the range of values fits my range that I set for the plot. It now works for me, thank you very much! – truongvu3 Mar 16 '21 at 13:38
  • 1
    Yes, when providing the bins, the whole range should be given. But the last bar would look too different from the rest. Therefore, the trick of changing all the bigger values to 11 can be useful. – JohanC Mar 16 '21 at 13:54

0 Answers0