2

I have a dataframe where both columns and rows can be considered as categories. I want to plot the values in each row on a scatter plot with row categories on y-axis and column categories with different colored dots, with x-axis as scale for the values. Preferred plot - plotly or seaborn

Simulated data

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randint(0, 100, size=(5, 4)), 
                  columns=list('ABCD'), index=list('PQRST'))
df
#     A   B   C   D
# P  21  95  91  90
# Q  21  12   9  68
# R  24  68  10  82
# S  81  14  80  39
# T  53  17  19  77

# plot
df.plot(marker='o', linestyle='')

Desired plot (similar to the below plot but with x-axis and y-axis switched) enter image description here

rahul-ahuja
  • 1,166
  • 1
  • 12
  • 24

2 Answers2

1

In my opinion, the way you have structured your DataFrame — making the index the categorical y-values and making each column the color — will make it pretty inconvenient for you to access your data for the purposes of plotting.

Instead, I think you can make your life easier by having one column for the values, one column for the categories P, Q, R, S, T, and a final column for the categories A, B, C, D that will correspond to differently colored points.

For data visualization, I would recommend Plotly express, since I think the documentation is excellent, and it's nice that the plots are interactive. For example, there's documentation on setting colors using column names, which I have done in my code below (and is one of the reasons I recommended structuring your DataFrame differently).

import numpy as np
import pandas as pd
import plotly.express as px

np.random.seed(42)

df = pd.DataFrame({
    'value':np.random.randint(0, 100, size=20),
    'category':['P','Q','R','S','T']*4,
    'color':['A','B','C','D']*5
})
df = df.sort_values(by='category')

fig = px.scatter(df, x='value', y='category', color='color')

## make the marker size larger than the default
fig.update_traces(marker=dict(size=14))
fig.show()

enter image description here

Derek O
  • 16,770
  • 4
  • 24
  • 43
1

With plotly as the plotting backend for pandas, all you need to do is reshape your dataframe from a wide to long format using pd.melt(), and run:

df.plot(kind='scatter', x='value', y='index', color = 'variable')

enter image description here

Complete code:

import numpy as np
import pandas as pd
pd.options.plotting.backend = "plotly"
df = pd.DataFrame(np.random.randint(0, 100, size=(5, 4)), 
                  columns=list('ABCD'), index=list('PQRST'))
df=pd.melt(df.reset_index(), id_vars=['index'], value_vars=df.columns)
df.plot(kind='scatter', x='value', y='index', color = 'variable')
vestland
  • 55,229
  • 37
  • 187
  • 305