0

I have a dataset that looks as follows:

df = pd.DataFrame({"Sunday":    {'a1':0.1,'a2':0.15,'a4':0.05,'a6':0.1,'b2':0.05,'b3':0.05,'b4':0.2,'c1':0.15,'c4':0.15},
                   "Monday":    {'a2':0.05,'a3':0.15,'a5':0.25,'b1':0.05,'b3':0.1,'b4':0.1,'c3':0.1,'c5':0.05,'c7':0.15},
                   "Tuesday":   {'a1':0.2,'a3':0.15,'a6':0.05,'b2':0.35,'b3':0.05,'c1':0.1,'c4':0.1},
                   "Wednesday": {'a2':0.05,'a3':0.05,'a4':0.35,'a6':0.2,'b1':0.1,'b3':0.05,'b4':0.05,'c1':0.05,'c6':0.1},
                   "Thursday":  {'a1':0.25,'a3':0.05,'a4':0.3,'a5':0.05,'a6':0.1,'b1':0.05,'b4':0.05,'c2':0.1,'c4':0.05},
                   "Friday":    {'a1':0.1,'a2':0.15,'a5':0.1,'a7':0.05,'b2':0.05,'b1':0.15,'b4':0.2,'c3':0.05,'c4':0.05,'c5':0.05,'c7':0.05},
                   "Saturday":  {'a1':0.15,'a3':0.05,'b2':0.05,'b3':0.05,'b4':0.4,'c1':0.1,'c5':0.1,'c7':0.1}
                   }}

The keys in dictionaries for each day are categorical and they come in three general types a, b, c: each of them having some number of sub-types. There are 6 subtypes in a-type, 4 subtypes in b-type and 7 subtypes in c-type. The values in dictionaries, represent weights (importance). Not every observation (day), needs to have all possible sub-types present. Nan values in each observation (day) should be ignored (when plot is created).

I would like to visualize this data using a discrete colormap, similarly as it was done in this post: heatmap-like plot, but for categorical variables in seaborn. Their solution is very elegant but my problem is slightly more complicated, as I would also like to reflect the weight of each subtype—by the height of the rectangle that represents it (shown on y-axis). Weights for each day always sum to one. On the x-axis, I would like to have weekdays shown. And all subtypes with colors assigned to them should be shown on the colorbar on the right with their corresponding code names: 'a1',...,'a6', 'b1', ...,'b4','c1',...,'c7'.

Finally, I would like to use different colormaps to color different subtypes: for example, blues for a-types, greens for b-types and reds for c-types.

I would like to do this using Python Seaborn package but if you can suggest a better solution using a different package I don't mind using it instead.

I would appreciate any suggestions. Thank you.

carpediem
  • 371
  • 3
  • 11

1 Answers1

1

To create a stacked bar graph with seaborn, you seem to need summed bars plotted on top of each other (see e.g. this blogpost). That gets quite complex with 18 types.

With pandas plot, things are a bit easier, although some manipulation is needed:

import matplotlib.pyplot as plt
import pandas as pd

df = pd.DataFrame({"Sunday":    {'a1':0.1,'a2':0.15,'a4':0.05,'a6':0.1,'b2':0.05,'b3':0.05,'b4':0.2,'c1':0.15,'c4':0.15},
                   "Monday":    {'a2':0.05,'a3':0.15,'a5':0.25,'b1':0.05,'b3':0.1,'b4':0.1,'c3':0.1,'c5':0.05,'c7':0.15},
                   "Tuesday":   {'a1':0.2,'a3':0.15,'a6':0.05,'b2':0.35,'b3':0.05,'c1':0.1,'c4':0.1},
                   "Wednesday": {'a2':0.05,'a3':0.05,'a4':0.35,'a6':0.2,'b1':0.1,'b3':0.05,'b4':0.05,'c1':0.05,'c6':0.1},
                   "Thursday":  {'a1':0.25,'a3':0.05,'a4':0.3,'a5':0.05,'a6':0.1,'b1':0.05,'b4':0.05,'c2':0.1,'c4':0.05},
                   "Friday":    {'a1':0.1,'a2':0.15,'a5':0.1,'a7':0.05,'b2':0.05,'b1':0.15,'b4':0.2,'c3':0.05,'c4':0.05,'c5':0.05,'c7':0.05},
                   "Saturday":  {'a1':0.15,'a3':0.05,'b2':0.05,'b3':0.05,'b4':0.4,'c1':0.1,'c5':0.1,'c7':0.1}
                   })
df.fillna(0, inplace=True)  # replace NA with zeros
df2 = df.T  # switch rows and columns
df2 = df2.reindex(sorted(df2.columns), axis=1)  # reorder the columns
types = df2.columns
num_type = {letter: len([t for t in types if t[0] == letter]) for letter in 'abc'}
df2.plot.bar(stacked=True, rot=0, figsize=(10, 5),
             color=[plt.cm.Blues_r(i / 7) for i in range(num_type['a'])]
                   + [plt.cm.Greens_r(i / 7) for i in range(num_type['b'])]
                   + [plt.cm.Reds_r(i / 7) for i in range(num_type['c'])])
plt.legend(bbox_to_anchor=(1.02, 1), loc='upper left') # legend outside the main plot
plt.tight_layout() # fit legend and labels
plt.show()

resulting plot

PS: Some of the manipulations needed to get a seaborn plot include converting the index to a named column (e.g. 'type') and converting the data to long form.

df.fillna(0, inplace=True)
df.index = df.index.set_names(['type'])
df.reset_index(inplace=True)
types = sorted(np.unique(df['type']))
df_long = df.melt(var_name='day', value_name='weight', id_vars='type')
sns.barplot(x='day', y='weight', hue='type', hue_order=types, data=df_long)
JohanC
  • 71,591
  • 8
  • 33
  • 66
  • Thank you so much, this is exactly what I needed. It's just perfect! Just one more question... is there a way to make the order of items identical in both, the graph and the colorbar on the side? – carpediem Jul 28 '20 at 11:40
  • The easiest would be to invert the yaxis with `plt.ylim(1, 0)`. – JohanC Jul 28 '20 at 11:43
  • Right, makes sense. Thank you. I really appreciate your help. – carpediem Jul 28 '20 at 11:45