1

I got the following code from the last plot in the Mosaic doc page:

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic

gender = ['male', 'male', 'male', 'female', 'female', 'female']
pet = ['cat', 'dog', 'dog', 'cat', 'dog', 'cat']
data = pd.DataFrame({'gender': gender, 'pet': pet})
mosaic(data, ['pet', 'gender'],  gap=0.06, title='DataFrame')
plt.show()

However, I'd like the color of the plot to be the same across the horizontal, i.e. grouping the females together in the cat and dog category with the same color. It should also apply to the male. I also want to increase the figsize and be able to input the percent proportion in the tile.

I experimented with the parameters but could not find a way to do it.

JohanC
  • 71,591
  • 8
  • 33
  • 66
Chuks
  • 15
  • 4

1 Answers1

0

The figsize can be set the standard matplotlib way: fig, ax = plt.subplots(figsize=....)) and passing the ax to the mosaic() function.

The color can be changed via the properties= parameter. This is a function that gets a key as input (e.g. ('cat', 'female')) and outputs a dictionary with Rectangle properties such as facecolor, alpha, hatch, linestyle, ... . The example below colors all cats green-blueish and all dogs brown. To make a difference between male and female, hatching or alpha could be set different.

The title can be passed via the title= parameter. The example uses an f-string with the percentage of cats.

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.graphics.mosaicplot import mosaic

gender = ['male', 'male', 'male', 'female', 'female', 'female']
pet = ['cat', 'dog', 'dog', 'cat', 'dog', 'cat']
third_col = [2, 3, 4, 5, 6, 7]
data = pd.DataFrame({'gender': gender, 'pet': pet, 'third': third_col})

percent_cats = f"cats: {100 * len(data[data['pet'] == 'cat']) / len(data):.1f} %"
props = lambda key: {'color': 'turquoise' if 'cat' in key else 'sienna'}
fig, ax = plt.subplots(figsize=(12, 4))
mosaic(data, ['pet', 'gender'], gap=0.06, title=percent_cats, properties=props, ax=ax)
plt.show()

resulting plot

Here is another example, with separate colors for male and female, with a changed order of the columns used and making the first layout direction horizontal.

percent_3rd_col = 100 * data[data['gender'] == 'female']['third'].sum() / data['third'].sum()
title = f"percent female: {percent_3rd_col:.1f} %"
props = lambda key: {'color': 'fuchsia' if 'female' in key else 'deepskyblue'}
mosaic(data, ['gender', 'pet'], horizontal=False, gap=0.06, title=title, properties=props, ax=ax)

female colored equal

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • JohanC thanks this worked. i was wondering if there's a third column with number like num=[2,3,4,5,6,7] is there a way to include this as a third column that will also have a separate color and the proportion of these value num display in the tile? – Chuks Oct 11 '20 at 13:46
  • I updated the second example to calculate the percentage via a third column. – JohanC Oct 12 '20 at 15:46