I have the following pandas
dataframe:
import random
import pandas as pd
random.seed(42)
so = pd.DataFrame({'x': random.sample(range(1, 100), 40),
'group': ['a']*10+['b']*10+['c']*10+['d']*10})
from so
I calculate the percentiles of x
:
so_percentiles = so.groupby('group')['x'].describe().reset_index()
where in my case it looks like this:
group count mean std min 25% 50% 75% max
0 a 10.0 41.2 33.773099 4.0 15.75 30.5 70.50 95.0
1 b 10.0 52.6 32.083918 5.0 28.50 60.0 74.50 97.0
2 c 10.0 65.1 31.067847 1.0 55.00 75.0 87.75 96.0
3 d 10.0 48.1 31.014154 7.0 21.25 46.5 68.75 99.0
For the above data, I want to create a histogram, facet
by group
and on each histogram also plot 2 vertical lines for each group
. One line for the 25%
percentile and one for the 75%
percentile.
So I am creating the histogram using plotly express
import plotly.express as px
fig_so = px.histogram(so, x='x', facet_row='group', cumulative=False, histnorm='percent',
nbins=100,
category_orders={
'cluster': ['a','b','c','d']
})
And then I add the lines by doing the following:
for r, g in enumerate(['a','b','c','d']):
val_25 = so_percentiles.query('group == @g')['25%'].values[0]
val_75 = so_percentiles.query('group == @g')['75%'].values[0]
fig_so.add_vline(x=val_25, line_dash="dot", row=r, col="all",
annotation_text="25th percentile",
annotation_position="bottom left"
)
fig_so.add_vline(x=val_75, line_dash="dot", row=r, col="all",
annotation_text="75th percentile",
annotation_position="bottom right"
)
And finally I do some facet
processing:
fig_so.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig_so.update_yaxes(matches=None)
fig_so.show()
The above code produces
The problem is, that for instance for group
d
, the 25%
percentile is 21.25, according to so_percentiles
, which is not what the plot is showing.
I am using plotly
version 5.5.0
My question is, how can I match the correct lines of percentiles to the correct histogram ?