4

I have the following pandas dataframe:

import random
import pandas as pd
random.seed(42)

so = pd.DataFrame({'x': random.sample(range(1, 100), 40),
                    'group': ['a']*10+['b']*10+['c']*10+['d']*10})

from so I calculate the percentiles of x:

so_percentiles = so.groupby('group')['x'].describe().reset_index()

where in my case it looks like this:

group  count  mean        std  min    25%   50%    75%   max
0     a   10.0  41.2  33.773099  4.0  15.75  30.5  70.50  95.0
1     b   10.0  52.6  32.083918  5.0  28.50  60.0  74.50  97.0
2     c   10.0  65.1  31.067847  1.0  55.00  75.0  87.75  96.0
3     d   10.0  48.1  31.014154  7.0  21.25  46.5  68.75  99.0

For the above data, I want to create a histogram, facet by group and on each histogram also plot 2 vertical lines for each group. One line for the 25% percentile and one for the 75% percentile.

So I am creating the histogram using plotly express

import plotly.express as px

fig_so = px.histogram(so, x='x', facet_row='group', cumulative=False, histnorm='percent',
                   nbins=100,
                  category_orders={
                      'cluster': ['a','b','c','d']
                  })

And then I add the lines by doing the following:

for r, g in enumerate(['a','b','c','d']):
    
    val_25 = so_percentiles.query('group == @g')['25%'].values[0]
    val_75 = so_percentiles.query('group == @g')['75%'].values[0]
    
    fig_so.add_vline(x=val_25, line_dash="dot", row=r, col="all",
                  annotation_text="25th percentile", 
              annotation_position="bottom left"
             )
    fig_so.add_vline(x=val_75, line_dash="dot", row=r, col="all",
                  annotation_text="75th percentile", 
              annotation_position="bottom right"
             )

And finally I do some facet processing:

fig_so.for_each_annotation(lambda a: a.update(text=a.text.split("=")[-1]))
fig_so.update_yaxes(matches=None)
fig_so.show()

The above code produces

enter image description here

The problem is, that for instance for group d, the 25% percentile is 21.25, according to so_percentiles, which is not what the plot is showing.

I am using plotly version 5.5.0

My question is, how can I match the correct lines of percentiles to the correct histogram ?

quant
  • 4,062
  • 5
  • 29
  • 70
  • 1
    it appears that the vlines for group `b` and `d` percentiles are switched, but I can't understand why this would be happening – Derek O Apr 20 '22 at 12:56
  • @DerekO exactly, this is happening. But I dont understand it either – quant Apr 20 '22 at 13:13
  • 1
    I too have confirmed this error. The cause is unknown, but until it is fixed, the only workaround is to use the following. `for r, g in enumerate(['a','d','c','b']):` I would recommend submitting to the [community](https://community.plotly.com/). – r-beginners Apr 21 '22 at 13:11
  • @r-beginners I did: https://community.plotly.com/t/vertical-lines-on-facetted-histogram-are-not-matching-the-facet/63215 – quant Apr 21 '22 at 13:15
  • 1
    I hope this matter is addressed as a priority. +1 – r-beginners Apr 21 '22 at 13:17

0 Answers0