3

I have a dictionary which has clusters and each cluster contains different labels

Dictonary look like this
demo_dict = {0: [b'3.0',b'3.0', b'3.0', b'5.0',b'5.0',b'5.0', b'6.0', b'6.0'],
 1: [b'2.0', b'2.0', b'3.0', b'7.0',b'7.0'],
 2: [b'1.0', b'4.0', b'8.0', b'7.0',b'7.0']}

To draw a required plot, am using the following code

comp = demo_dict
df = pd.DataFrame.from_dict(comp, orient='index')
df.index.rename('Clusters', inplace=True)

stacked = df.stack().reset_index()
stacked.rename(columns={'level_1': 'Lable', 0: 'Labels'}, inplace=True)

sns.scatterplot(data=stacked, x='Clusters', y='Labels')
plt.show()

But the thing is, the above code is not drawing all the points, it just mentioned which clusters contains which labels,but i want to have all the points of every cluster on visual.

enter image description here

Is, there something am missing in this code to generate all the points Note: I have also tried with stripplot and swarmplot

robinHood013
  • 209
  • 1
  • 2
  • 15
  • You could try the approaches of [How to make jitterplot on matplolib python](https://stackoverflow.com/questions/60229586/how-to-make-jitterplot-on-matplolib-python) to add jitter to the points after they have been drawn by `sns.scatterplot`. It all depends on how many points fall together and what exactly you want to show. It might be needed to use a smaller dot size or applying some alpha. – JohanC Mar 28 '21 at 10:52

2 Answers2

4

With groupby you can group using two columns. The counts can then be displayed via a heatmap:

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

demo_dict = {}
for i in range(40):
    demo_dict[i] = np.random.choice([b'1.0', b'2.0', b'3.0', b'4.0', b'5.0', b'6.0', b'7.0', b'8.0'],
                                    np.random.randint(10, 30))
df = pd.DataFrame.from_dict(demo_dict, orient='index')
df.index.rename('Clusters', inplace=True)

stacked = df.stack().reset_index()
stacked.rename(columns={'level_1': 'Lable', 0: 'Labels'}, inplace=True)

grouped = stacked.groupby(['Labels', 'Clusters']).agg('count').unstack()

fig = plt.figure(figsize=(15, 4))
ax = sns.heatmap(data=grouped, annot=True, cmap='rocket_r', cbar_kws={'pad': 0.01})
ax.set_xlabel('')
ax.tick_params(axis='y', labelrotation=0)
plt.tight_layout()
plt.show()

heatmap of groupby counts

An alternative is to show the counts as sizes in a scatterplot

grouped = stacked.groupby(['Labels', 'Clusters']).agg('count').reset_index()
fig = plt.figure(figsize=(15, 4))
ax = sns.scatterplot(data=grouped, x='Clusters', y='Labels', size='Lable', color='orchid')
for h in ax.legend_.legendHandles:
    h.set_color('orchid')  # the default color in the sizes legends is black
ax.margins(x=0.01) # less whitespace
# set the legend outside
ax.legend(handles=ax.legend_.legendHandles, title='Counts:', bbox_to_anchor=(1.01, 1.02), loc='upper left')

scatterplot with sizes

You could also try the approach from How to make jitterplot on matplolib python, optionally using different jitter offsets in x and y direction. With your data it could look as follows:

def jitter_dots(dots):
    offsets = dots.get_offsets()
    jittered_offsets = offsets
    jittered_offsets[:, 0] += np.random.uniform(-0.3, 0.3, offsets.shape[0]) # x
    jittered_offsets[:, 1] += np.random.uniform(-0.3, 0.3, offsets.shape[0]) # y
    dots.set_offsets(jittered_offsets)

ax = sns.scatterplot(data=stacked, x='Clusters', y='Labels')
jitter_dots(ax.collections[0])

scatterplot with jitter

Here is how it could look like with 8 different colors, alternating per cluster:

ax = sns.scatterplot(data=stacked, x='Clusters', y='Labels',
                     hue=stacked['Clusters'] % 8, palette='Dark2', legend=False)
jitter_dots(ax.collections[0])
ax.margins(x=0.02)
sns.despine()

scatterplot with colors per column

JohanC
  • 71,591
  • 8
  • 33
  • 66
  • i appreacite your answer. With regards to your first answer, am not looking for histogram. And with respect to the second one, i dont have to show counts explicitly. I need to draw a simple in scatter plot which just describe each clusters with dots equals to the number of amount labels it has. – robinHood013 Mar 28 '21 at 14:05
  • it seems to be working as what i want , could you also tell me, how we can change the color of dots for each cluster – robinHood013 Mar 28 '21 at 15:34
  • Yes, I have added "hue='Clusters' " and now it gives me exactly what i want. thanks man – robinHood013 Mar 28 '21 at 15:41
  • 1
    Nice trick at the end with the modulo hues. – tdy Mar 28 '21 at 18:10
3

If I understand correctly, you can use a swarmplot (or the similar stripplot):

sns.swarmplot(data=stacked, x='Clusters', y='Labels')

swarm plot

tdy
  • 36,675
  • 19
  • 86
  • 83
  • here i have posted the demo dict. But in real i have dictionary where each cluster cotains at least 40 labels. I have tried to use swarmplot or stripplot there already. But still it didnot show all points – robinHood013 Mar 28 '21 at 06:16