1

I have a df with two columns, ACQUISITION_CHANNEL and HOURS_WORKED_CUMULATIVE which I am plotting as a jittered bar plot using the code below. I would like to sort the categories on the x axis so that they are ordered by the highest median first.

ACQUISITION_CHANNEL HOURS_WORKED_CUMULATIVE
Referral 34
Job Platform 42
Referral 34
Offline 42
Referral 34
Digital 42

...

group = 'ACQUISITION_CHANNEL'
column = 'HOURS_WORKED_CUMULATIVE'
grouped = df.groupby(group)


names, vals, xs = [], [] ,[]

for i, (name, subdf) in enumerate(grouped):
    names.append(name)
    vals.append(subdf[column].tolist())
    xs.append(np.random.normal(i+1, 0.1, subdf.shape[0]))

plt.boxplot(vals, labels=names, showfliers=False )
ngroup = len(vals)
clevels = np.linspace(0., 1., ngroup)

for x, val, clevel in zip(xs, vals, clevels):
    plt.scatter(x, val, alpha=0.4, c='#1f77b4')

plt.title('Hours Worked by Acquisition Channel')
plt.xlabel('Acquisition Channel')
plt.ylabel('Total Hours Worked') 

Jittered box plot

BigBen
  • 46,229
  • 7
  • 24
  • 40

2 Answers2

0

You could sort the data by median before plotting it. This question has a good answer

meds = df2.median()
meds.sort_values(ascending=False, inplace=True)
df2 = df2[meds.index]
df2.boxplot()
DDaly
  • 474
  • 2
  • 6
0

You could simplify the construction by combining seaborn's boxplot and stripplot:

import seaborn as sns
import pandas as pd
import numpy as np

group = 'ACQUISITION_CHANNEL'
column = 'HOURS_WORKED_CUMULATIVE'
df = pd.DataFrame({group: np.random.choice(['Referral', 'Job Platform', 'Offline', 'Digital'], 1000),
                   column: np.random.randint(10, 300, 1000) + np.random.randint(1, 30, 1000) ** 2})

x_order = df.groupby(group)[column].agg('median').sort_values(ascending=False).index
ax = sns.boxplot(data=df, x=group, order=x_order, y=column,
                 palette=['skyblue'], medianprops={'color':'orange'})
sns.stripplot(data=df, x=group, order=x_order, y=column, jitter=0.4, alpha=0.6, ax=ax)

The stripplot can also be replaced by a swarmplot:

sns.swarmplot(data=df, x=group, order=x_order, y=column, size=3, alpha=0.6, ax=ax)

combined boxplot and stripplot or swarmplot

JohanC
  • 71,591
  • 8
  • 33
  • 66