1

Using plotnine in python, I'd like to add dashed horizontal lines to my plot (a scatterplot, but preferably an answer compatible with other plot types) representing the mean for every color separately. I'd like to do so without manually computing the mean values myself or adapting other parts of the data (e.g. adding columns for color values etc).

Additionally, the original plot is generated via a function (make_plot below) and the mean lines are to be added afterwards, yet need to have the same color as the points from which they are derived.

Consider the following as a minimal example;

import pandas as pd
import numpy as np
from plotnine import *


df  = pd.DataFrame( { 'MSE': [0.1,  0.7,  0.5,  0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
                        'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
                        'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]  } )

def make_plot(df, var_x, var_y, var_fill) :
    plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()
    return plot

plot = make_plot(df, 'Number', 'MSE', 'Size')

I'd like to add 4 lines, one for each Size. The exact same can be done in R using ggplot, as shown by this question. Adding geom_line(stat="hline", yintercept="mean", linetype="dashed") to plot however results in an error PlotnineError: "'stat_hline' Not in Registry. Make sure the module in which it is defined has been imported." that I am unable to resolve.

Answers that can resolve the aforementioned issue, or propose another working solution entirely, are greatly appreciated.

Mathijs
  • 19
  • 3

1 Answers1

1

You can do it by first defining the means as a vector and then pass it to your function:

import pandas as pd
import numpy as np
from plotnine import *
from random import randint



df  = pd.DataFrame( { 'MSE': [0.1,  0.7,  0.5,  0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
                        'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
                        'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]  } )

a = df.groupby(['Size'])['MSE'].mean()  ### Defining yuor means
a = list(a)

def make_plot(df, var_x, var_y, var_fill):
    plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()+ geom_hline(yintercept =a,linetype="dashed")
    return plot

plot = make_plot(df, 'Number', 'MSE', 'Size')

which gives:

enter image description here

Note that two of the lines coincide:

a = [0.6666666666666666, 0.5, 0.4666666666666666, 0.6666666666666666]

To add different colors to each dashed line, you can do this:

import pandas as pd
import numpy as np
from plotnine import *


df  = pd.DataFrame( { 'MSE': [0.1,  0.7,  0.5,  0.2, 0.3, 0.4, 0.8, 0.9 ,1.0, 0.4, 0.7, 0.9 ],
                        'Size': ['S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL', 'S', 'M', 'L', 'XL'],
                        'Number': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]  } )

### Generate a list of colors of the same length as your categories (Sizes)
color = []
n = len(list(set(df.Size)))

for i in range(n):
    color.append('#%06X' % randint(0, 0xFFFFFF))
######################################################

def make_plot(df, var_x, var_y, var_fill):
    plot = ggplot(df) + aes(x='Number', y='MSE', fill = 'Size') + geom_point()+ geom_hline(yintercept =list(df.groupby(['Size'])['MSE'].mean()),linetype="dashed", color =b)
    return plot

plot = make_plot(df, 'Number', 'MSE', 'Size')

which returns:

enter image description here

  • It's almost exactly what I need. As listed here, I'd have to pass `a` as an argument to `make_plot`, which wouldn't be too much of a stretch in my case. I'm still searching for a way to color the dashed lines in the same color as the points, as adding `color = "Size" ` does not do the trick. – Mathijs May 28 '22 at 13:16
  • If `b` could either be extracted from the plot itself or be set to whatever default color palette ggplot uses (with changing amount of colors required), then this would be a viable answer. Manually picking colors that is not a scalable solution. – Mathijs May 31 '22 at 07:56
  • @Mathijs I updated my answer with a random color generator. It creates a list of colors that matches the lenght of your list of sizes. – Serge de Gosson de Varennes May 31 '22 at 08:44