2

Sorry if this question is duplicated as I am not able to find a solution.

I do have a data frame:

| sample_ids | perc_A | perc_B | perc_C |
|------------|--------|--------|--------|
| sample 1   | 0.75   | 0.18182| 0.42222|
| sample 2   | 0.66667| 0.24747| 0.15823|
| sample 3   | 0.70213| 0.28176| 0.17925|

With this, I would like to plot a 100% stacked and grouped bar chart (as shown below; a similar image taken from GitHub). enter image description here


Detailed explanation based on the provided figure:
Let's say sample 1 is Apples, for bar A, 75% will be in dark purple (legend: True_perc_a) while 25% will be in light purple (legend: False_perc_a); for bar B, 18.19% will be in dark green (legend: True_perc_b) while 81.81% will be in light green (legend: False_perc_b); for bar C, 42.22% will be in dark yellow (legend: True_perc_c) while 57.78% will be in light yellow (legend: False_perc_c). The same conditions apply to sample 2 and sample 3.

I was able to process the data to get the true and false perc. For example:

df['perc_A'] = (df['perc_A']*100).round(2)
df['perc_F_A'] = (100 - df['perc_A']).round(2)

However, I have some difficulties to plot the figure.

Naomi Sun
  • 47
  • 5

1 Answers1

1

Because we know that the total percent will be 100%, we can just set the "False" value to 1. Then, we can melt the dataframe on the sample_ids column, rename the columns, and multiply everything by 100 (to make them percents). From here, we will grab the "false" percents by choosing the values in the percs column that contains an F and then graph it with Seaborn so that we can set the hue to the perc name. Set the palette to whatever colors you want and then set the alpha to 0.5 to make the difference between the true and false percents more apparent. Then graph the true percents after (this places them in front of the false percent bars) and you have your stacked bar plot:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (20,10)

df = pd.DataFrame({'sample_ids':['sample1', 'sample2', 'sample3'], 
                   'perc_A':[0.75,0.66667,0.70213],
                   'perc_B':[0.18182,0.24747,0.28176],
                   'perc_C':[0.4222,0.15823,0.17925]})

df[['perc_F_A', 'perc_F_B', 'perc_F_C']] = 1

meltedDF = df.melt(id_vars=['sample_ids'], var_name='perc', value_name='percent')
meltedDF['percent']=meltedDF['percent']*100

sns.barplot(data=meltedDF[meltedDF.perc.str.contains('F')], x='sample_ids', y='percent', hue='perc', palette=['blue','green','red'], alpha=0.5)
sns.barplot(data=meltedDF[~meltedDF.perc.str.contains('F')], x='sample_ids', y='percent', hue='perc', palette=['blue','green','red'])
plt.show()

Graph:

enter image description here

As an aside, a better way to generate your 'False" percents if you do want their true value would be to do this:

df = pd.DataFrame({'sample_ids':['sample1', 'sample2', 'sample3'], 
                   'perc_A':[0.75,0.66667,0.70213],
                   'perc_B':[0.18182,0.24747,0.28176],
                   'perc_C':[0.4222,0.15823,0.17925]})

df[['perc_F_A', 'perc_F_B', 'perc_F_C']] = df.groupby('sample_ids').apply(lambda x: 1-x)

Output:


   sample_ids   perc_A   perc_B     perc_C   perc_F_A   perc_F_B    perc_F_C
0   sample1     0.75000  0.18182    0.42220  0.25000    0.81818  0.57780
1   sample2     0.66667  0.24747    0.15823  0.33333    0.75253  0.84177
2   sample3     0.70213  0.28176    0.17925  0.29787    0.71824  0.82075
Michael S.
  • 3,050
  • 4
  • 19
  • 34
  • Thank you very much Michael. I do have an extended question from this plot. Do you have any idea on how to i) change the legend text? I tried with this code but it does not work. `plt.legend(labels=['F(A)', 'F(B)', 'F(C)','T(A)', 'T(B)', 'T(C)'])` It shows all legend colors as black. ii) separate the bar into a few different plots based on the sample id (for example, sample1a_1, sample1b_2 will be in one plot, while sample2a_1, sample2b_2, sample2c_3 will be in another plot. – Naomi Sun Aug 19 '22 at 14:15
  • Can't you just change your column names to the names that you want in the legend? – Michael S. Aug 19 '22 at 14:17
  • Yes, I can but I thought it can change it in a flexible way, no? – Naomi Sun Aug 19 '22 at 14:21