0

Surprisingly little info out there regarding python and the pyalluvial package. I'm hoping to combine stacked bars and a corresponding alluvial in the same figure.

Using below, I have three unique groups, which is outlined in Group. I want to display the proportion of each Group for each unique Point. I have the data formatted this way as I need three separate stacked bar charts for each Point.

So overall (Ove) highlight the overall proportion taken from all three Points. Group 1 makes up 70%, Group 2 makes up 20%, Group 3 makes up 10%. But the proportion of each group changes at different intervals Points. I'm hoping to show this like a standard stacked bar chart, but add the alluvial over the top.

import pandas as pd
import pyalluvial.alluvial as alluvial 

df = pd.DataFrame({
    'Group': [1, 2, 3],
    'Ove': [0.7, 0.2, 0.1],
    'Point 1': [0.8, 0.1, 0.1],
    'Point 2': [0.6, 0.2, 0.2],
    'Point 3': [0.7, 0.3, 0.0],
    })

ax = alluvial.plot(
   df = df, 
   xaxis_names = ['Group','Point 1','Point 2', 'Point 3'], 
   y_name = 'Ove', 
   alluvium = 'Group',
)

Output shows the overall group proportion (1st bar) being correct. But the following stacked bars with the proportions.

If I transform the df and put the Points as a single column, then I don't get 3 separate bars.

enter image description here

Chopin
  • 96
  • 1
  • 10
  • 35
  • Is this in the correct data format? For example, if you change the data for 'point1' to [0.7,0.1,0.2], it will be classified correctly. I am commenting on this based on the example data in the [Github](https://github.com/nekoumei/pyalluvial). – r-beginners Jul 30 '21 at 08:34
  • Yes it's correct. This doesn't work as the stacked bar charts aren't correct. – Chopin Aug 02 '21 at 05:03
  • The data format is wrong the alluvial.plot expects x to be categorical and y to be continuous. Since in point1 there are only two categories 0.8 and 0.1 the plot is correct it fetches values from 'ove' column that is 0.7 for 0.8 class and 0.2 and 0.1 for the 0.1 class. – darth baba Aug 02 '21 at 05:45
  • I get that. But if I format it that way I only have a single entry for the `Point` data. Therefore, I don't get 3 separate stacked bars. – Chopin Aug 02 '21 at 09:32

1 Answers1

1

As correctly pointed out by @darthbaba, pyalluvial expects the dataframe format to consist of frequencies matching different variable-type combinations. To give you an example of a valid input, each Point in each Group has been labelled as present (1) or absent (0):

df = pd.DataFrame({
    'Group': [1] * 6 + [2] * 6 + [3] * 6,
    'Point 1': [1, 1, 1, 1, 0, 0] * 3,
    'Point 2': [0, 1, 0, 1, 1, 0] * 3,
    'Point 3': [0, 0, 1, 1, 1, 1] * 3,
    'freq': [23, 11, 5, 7, 10, 12, 17, 3, 6, 17, 19, 20, 28, 4, 13, 8, 14, 9]
    })

fig = alluvial.plot(df=df, xaxis_names=['Point 1','Point 2', 'Point 3'], y_name='freq', alluvium='Group', ignore_continuity=False)

enter image description here

Clearly, the above code doesn't resolve the issue since pyalluvial has yet to support the inclusion of stacked bars, much like how it's implemented in ggalluvial (see example #5). Therefore, unless you want to use ggalluvial, your best option IMO is to add the required functionality yourself. I'd start by modifying line #85.

micmalti
  • 561
  • 3
  • 16