0

I am trying to display a stacked bar graph.I have 3 lists as shown below-

totalpointperxaxis [6, 9, 13, 5, 14, 382, 26, 2, 45, 2]

clusternamesList [['Cluster1', 'Cluster2'], ['Cluster1', 'Cluster3'], ['Cluster2', 'Cluster4'], ['Cluster1', 'Cluster3'], ['Cluster2', 'Cluster5'], ['Cluster3', 'Cluster6', 'Cluster7'], ['Cluster2', 'Cluster4', 'Cluster6', 'Cluster7'], ['Cluster1', 'Cluster3'], ['Cluster1', 'Cluster2', 'Cluster4', 'Cluster5', 'Cluster6'], ['Cluster1', 'Cluster3']]

ppclusterList [[1, 5], [4, 5], [12, 1], [1, 4], [13, 1], [6, 173, 203], [21, 2, 1, 2], [1, 1], [2, 34, 2, 6, 1], [1, 1]]

Here, "totalpointperxaxis" would define the heights of each bar "ppcluster" (pointspercluster) , would be part of each of these bars, colour coordinated according to the cluster name. The number of clusters is not known before hand and the list may change when i add more data points.

As you can see, each list has 10 sublists. The idea is to display a stacked bar graph as seen in this example enter image description here

1 Answers1

0

An idea is to first create a long form dataframe to collect all the values. And then transform it to a pivot_table to be plotted.

import pandas as pd
import numpy as np

clusternamesList = [['Cluster1', 'Cluster2'], ['Cluster1', 'Cluster3'], ['Cluster2', 'Cluster4'], ['Cluster1', 'Cluster3'], ['Cluster2', 'Cluster5'], ['Cluster3', 'Cluster6', 'Cluster7'], ['Cluster2', 'Cluster4', 'Cluster6', 'Cluster7'], ['Cluster1', 'Cluster3'], ['Cluster1', 'Cluster2', 'Cluster4', 'Cluster5', 'Cluster6'], ['Cluster1', 'Cluster3']]

ppclusterList = [[1, 5], [4, 5], [12, 1], [1, 4], [13, 1], [6, 173, 203], [21, 2, 1, 2], [1, 1], [2, 34, 2, 6, 1], [1, 1]]
df = pd.DataFrame([{'id': point_id, 'cluster': cluster, 'point': point}
                   for point_id, (clusternames, ppcluster) in enumerate(zip(clusternamesList, ppclusterList))
                   for cluster, point in zip(clusternames, ppcluster)])

df_table = df.pivot_table(values='point', index='id', columns='cluster', fill_value=0)
df_table.plot.bar(stacked=True, rot=0)

pandas bar plot from pivot table

The dataframe looks like:

    id   cluster  point
0    0  Cluster1      1
1    0  Cluster2      5
2    1  Cluster1      4
3    1  Cluster3      5
4    2  Cluster2     12
5    2  Cluster4      1
6    3  Cluster1      1
7    3  Cluster3      4
8    4  Cluster2     13
9    4  Cluster5      1
10   5  Cluster3      6
11   5  Cluster6    173
12   5  Cluster7    203
13   6  Cluster2     21
14   6  Cluster4      2
15   6  Cluster6      1
16   6  Cluster7      2
17   7  Cluster1      1
18   7  Cluster3      1
19   8  Cluster1      2
20   8  Cluster2     34
21   8  Cluster4      2
22   8  Cluster5      6
23   8  Cluster6      1
24   9  Cluster1      1
25   9  Cluster3      1

It could be handy to store the data directly in this form instead of as nested lists.

The pivot table then looks like:

cluster  Cluster1  Cluster2  Cluster3  Cluster4  Cluster5  Cluster6  Cluster7
id                                                                           
0               1         5         0         0         0         0         0
1               4         0         5         0         0         0         0
2               0        12         0         1         0         0         0
3               1         0         4         0         0         0         0
4               0        13         0         0         1         0         0
5               0         0         6         0         0       173       203
6               0        21         0         2         0         1         2
7               1         0         1         0         0         0         0
8               2        34         0         2         6         1         0
9               1         0         1         0         0         0         0
JohanC
  • 71,591
  • 8
  • 33
  • 66