0

I would like to, using plot.ly dash, plot grouped violins with a hierarchical grouping. It is not apparent to me from the docs.

For clarity, consider the tips dataset that they provide.

Assume that I have modified the dataset to bin the tips to "low", "medium", and "high" tippers.

Then I would like to, for each group of binned-tipper (low/med/high), plot the grouped violin of male vs female.

It seems that I need to make, for the data traces array, something in the form of:


tip_groups = ['low', 'med', 'big']

for tgrp in tip_groups:
    for sex in ['male', 'female']
        dff = df[df['tip_group'] == tgrp & (df['sex'] == sex)]
        data.append({
            'x': dff['tip_group'],
            'y': dff['tip'],
            'legendgroup':dff['tip'],
            'scalegroup': dff['tip_group']
        })

but this does not yield the desired results, as I expect to see three groups of two violins each.

some code:


df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")
def tip_cat(row):
    percent = row.tip / row.total_bill * 100
    if percent < 10:
        return 'low'
    if percent < 15:
        return 'med'    
    return 'big'

df['tip_category'] = df.apply(tip_cat, axis=1)

data = []

sexes = np.unique(df.sex)
categories = np.unique(df.tip_category)

colors = [
    '#e6194B',
    '#3cb44b',
    '#ffe119',
    '#4363d8',
    '#f58231',
    '#911eb4',
    '#42d4f4',
]

for i, sex in enumerate(sexes):
    for j, cat in enumerate(categories):
        dff = df[
            (df['sex'] == sex)
            & (df['tip_category'] == cat)
        ]

        data.append({
            'type': 'violin',
            'x': dff['sex'],
            'y': dff['tip'],
            'legendgroup':  '{}: {}'.format(sex, cat),
            'scalegroup':  '{}: {}'.format(sex, cat),
            'name': '{}: {}'.format(sex, cat),
            'fillcolor': colors[i],
             "line": {
                "color": 'black'
             },
        })

fig = {
    'data': data, 
    'layout': {

    }
}




# import plotly_express as px # this works, but not with go.Violin
px.violin(df, y="tip", x="sex", color="tip_category", box=True, points="all", hover_data=df.columns)
sentence
  • 8,213
  • 4
  • 31
  • 40
SumNeuron
  • 4,850
  • 5
  • 39
  • 107

1 Answers1

1

If I understood correctly, you should modify a few lines in your code, namely:

from:

'x': dff['sex'],

to:

'x': dff['tip_category'],

and

from:

'layout': {

    }

to:

"layout" : {
        "yaxis": {
            "zeroline": False,
        },
        "violinmode": "group"
    }

To sum up:

df = pd.read_csv("https://raw.githubusercontent.com/plotly/datasets/master/violin_data.csv")
def tip_cat(row):
    percent = row.tip / row.total_bill * 100
    if percent < 10:
        return 'low'
    if percent < 15:
        return 'med'    
    return 'big'

df['tip_category'] = df.apply(tip_cat, axis=1)

data = []

sexes = np.unique(df.sex)
categories = np.unique(df.tip_category)

colors = [
    '#e6194B',
    '#3cb44b',
    '#ffe119',
    '#4363d8',
    '#f58231',
    '#911eb4',
    '#42d4f4',
]

for i, sex in enumerate(sexes):
    for j, cat in enumerate(categories):
        dff = df[
            (df['sex'] == sex)
            & (df['tip_category'] == cat)
        ]

        data.append({
            'type': 'violin',
            'x': dff['tip_category'],
            'y': dff['tip'],
            'legendgroup':  '{}: {}'.format(sex, cat),
            'scalegroup':  '{}: {}'.format(sex, cat),
            'name': '{}: {}'.format(sex, cat),
            'fillcolor': colors[i],
             "line": {
                "color": 'black'
             },
        })

fig = {
    'data': data, 
"layout" : {
        "yaxis": {
            "zeroline": False,
        },
        "violinmode": "group"
    }
}

iplot(fig, filename = 'violin/grouped', validate = False)

and you get:

enter image description here

sentence
  • 8,213
  • 4
  • 31
  • 40
  • so that now works with `iplot` but the legend, scale and colors are still sort of high-jacked together. Image the cases where the first grouping (`tip_category`) had `n` groups and the second (`sex`) had `m` , it would be nice if I could see both labels on the x-axis and for, the legend, if I could toggle either of the groups (e.g. turn off `sex='male'` or `category='low'`) – SumNeuron Jul 10 '19 at 17:30
  • 1. I want to wait a bit to see if anyone else chimes in 2. the question is specifically about mulit-groups. What I tried and what you refined is just a hack of single groups by concatenating the groups strings ;) – SumNeuron Jul 10 '19 at 17:46