1

Mtcars is a public dataset in R. I'm not sure it's a public dataset in python.

mtcars <- mtcars

I created this boxplot in R and part of what I'm doing is reordering the y-axis with the reorder() function.

ggplot(mtcars, aes(x = mpg, y = reorder(origin, mpg), color = origin)) +
  geom_boxplot() + 
  theme(legend.position = "none") +
  labs(title = "Mtcars", subtitle = "Box Plot") +
  theme(plot.title = element_text(face = "bold")) +
  ylab("country") 

enter image description here

Now in python I have this boxplot that I created with seaborn:

plt.close()

seaborn.boxplot(x="mpg", y="origin", data=mtcars)
plt.suptitle("Mtcars", x=0.125, y=0.97, ha='left', fontweight = 'bold')
plt.title("boxplot", loc = 'left')
plt.show()

enter image description here

I'm trying to render it now but the same kind of treatment for R doesn't work.

plt.close()

seaborn.boxplot(x="mpg", y=reorder("origin", 'mpg'), data=mtcars)
plt.suptitle("Mtcars", x=0.125, y=0.97, ha='left', fontweight = 'bold')
plt.title("boxplot", loc = 'left')
plt.show()

enter image description here

It's not surprising it doesn't work because it's a different language; I do know that! But how would I do this reordering in python using Seaborn? I'm having trouble understanding if this is even part of the plotting process.

hachiko
  • 671
  • 7
  • 20

1 Answers1

0

You can compute a custom order and feed it to seaborn's boxplot order parameter:

import seaborn as sns

mtcars = sns.load_dataset('mpg')

order = mtcars.groupby('origin')['mpg'].median().sort_values(ascending=False)

sns.boxplot(x="mpg", y="origin", data=mtcars, order=order.index)

plt.suptitle("Mtcars", x=0.125, y=0.97, ha='left', fontweight = 'bold')
plt.title("boxplot", loc = 'left')
plt.show()

NB. order also acts like a filter, so if values are missing, of non-existent they will be omitted in the graph

output:

custom order

mozway
  • 194,879
  • 13
  • 39
  • 75
  • Your answer is also pointing out that my original boxplot was sorted by the mean, which is apparently a default in R and maybe median in this case is the better option – hachiko Apr 29 '22 at 18:11
  • @hachiko here you can use any function you want (mean, median, min, max, std…) – mozway Apr 29 '22 at 18:13