0

I have a stacked histogram made using matplotlib. It has of course multiple bins (on per sector) and each bin/bar is further segmented in subsectors (stacked histogram).

I'm wondering how I could get the datapoints, do some math (let's say divide each bin by it's total value), and than set the new datapoints.

How I expect it to work:

import matplotlib.plt as plt
ax = plt.subplt(111)
h = ax.hist((subsector1,subsector2,subsector3), bins = 20, stacked=True)

y_data = h.get_yData

The shape of y_data would be something like 20 x 3 (bins x subsectors)

new_y_data = y_data normalized by total on each bin

The shape of new_y_data would also be like 20 x 3, but the sum on each bin would be 1 (or 100%)

new_h = h.set_yData(new_y_data)

new_h would look more like a bar plot, with equal sized bars, but different subsector distributions on each bar..

Is this even possible in python matplotlib?

redguy
  • 67
  • 6
  • Does this answer your question? [Get data points from a histogram in Python](https://stackoverflow.com/questions/20128898/get-data-points-from-a-histogram-in-python) – Roim Jul 24 '20 at 12:01
  • Partially, as it covers the 'get' part of the question. I also need to add these point (after some modification) into the hist. Meanwhile I'm trying to see if I can build some bar plot with the result.. – redguy Jul 24 '20 at 13:20
  • 1
    Once you have all data and styling you can recreate the plot yourself – Roim Jul 24 '20 at 13:21

1 Answers1

1

When you only want the values, it's easier to use np.histogram which does the same calculations without the need to draw.

When you have values, plt.bar draws the directly without needing plt.hist.

Pandas plot.bar might be an alternative. Have a look at Creating percentage stacked bar chart using groupby for an example similar to yours.

Here is some example code using np.histogram and plt.bar:

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import PercentFormatter

subsector1 = np.clip(np.random.normal(70, 20, 400), 0, 100)
subsector2 = np.clip(np.random.normal(50, 20, 1000), 0, 100)
subsector3 = np.clip(np.random.normal(25, 20, 500), 0, 100)
num_bins = 20
x_min = np.min(np.concatenate([subsector1, subsector2, subsector3]))
x_max = np.max(np.concatenate([subsector1, subsector2, subsector3]))
bounds = np.linspace(x_min, x_max, num_bins + 1)
values = np.zeros((num_bins, 3))
for i, subsect in enumerate((subsector1, subsector2, subsector3)):
    values[:, i], _ = np.histogram(subsect, bins=bounds)
with np.errstate(divide='ignore', invalid='ignore'):
    values /= values.sum(axis=1, keepdims=True)
fig, ax = plt.subplots()
bottom = 0
for i in range(3):
    plt.bar((bounds[:-1] + bounds[1:]) / 2, values[:, i], bottom=bottom, width=np.diff(bounds) * 0.8)
    bottom += values[:, i]
plt.xlim(x_min, x_max)
plt.gca().yaxis.set_major_formatter(PercentFormatter(1.0))
plt.show()

example plot

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • This answers my question (short answer is 'no') and provides a good alternative solution. Thank you! One comment tho, Matlotlib's plt.hist does indeed have a stacked= option. – redguy Jul 27 '20 at 07:29