0

I'm trying to make a specific plot using ggplot/plotnine. I'd like to keep it plotnine if possible so everything I've done can stay in the same notebook.

Essentially I want to create a box/jitter plot that allows for separation by category or further jitters by category.

The data I am working with evaluates 4 different models and their percent error based on a specific metric. Structure of data is below.

Data:

    Model   metric  Percent Error
0   gbr Lower   46.533009
1   gbr Lower   22.654213
2   gbr Lower   17.404358
3   gbr Lower   5.134485
4   gbr Lower   4.550838
... ... ... ...
9963    cqrn    Average 5.745320
9964    cqrn    Average 16.465810
9965    cqrn    Average 14.737193
9966    cqrn    Average 81.743560
9967    cqrn    Average 73.008793

Code:

(ggplot(dat,aes(x="metric",
                y="Percent Error",
                color = "Model")) +
geom_jitter(width = .25,alpha=.4,show_legend=False) +
scale_y_log10() +
labs(title=f"Error Metrics"))

This code renders: enter image description here

I want to have a graph that looks like this (sorry for the crude drawing). This can be down with box plots or jitters - although bonus points for jitter if you can!

enter image description here

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
mk2080
  • 872
  • 1
  • 8
  • 21

1 Answers1

0

This could be achieved by setting making use of position = position_jitterdodge():

Note: I slightly changed the example data you provided to make the example more realistic.

dat = [['gbr', 'Lower', 46.533009], ['gbr', 'Lower', 22.654213], ['gbr', 'Lower', 17.404358], ['cqrn', 'Lower', 5.134485],['cqrn', 'Lower', 4.550838],['gbr', 'Average', 5.745320],['gbr', 'Average', 16.465810],['cqrn', 'Average', 14.737193],['cqrn', 'Average', 81.743560],['cqrn', 'Average', 73.008793]]

dat = pd.DataFrame(dat, columns = ['Model', 'metric','Percent Error'])

(ggplot(dat,aes(x="metric",
                y="Percent Error",
                color = "Model")) +
geom_jitter(alpha=.4, position = position_jitterdodge(jitter_width = .25)) +
scale_y_log10() +
labs(title="Error Metrics"))

enter image description here

stefan
  • 90,330
  • 6
  • 25
  • 51