22

I have a scatterplot and I want to color it based on another value (naively assigned to np.random.random() in this case).

Is there a way to use seaborn to map a continuous value (not directly associated with the data being plotted) for each point to a value along a continuous gradient in seaborn?

Here's my code to generate the data:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False})

%matplotlib inline
np.random.seed(0)

# Iris dataset
DF_data = pd.DataFrame(load_iris().data, 
                       index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
                       columns = load_iris().feature_names)

Se_targets = pd.Series(load_iris().target, 
                       index = ["iris_%d" % i for i in range(load_iris().data.shape[0])], 
                       name = "Species")

# Scaling mean = 0, var = 1
DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data), 
                           index = DF_data.index,
                           columns = DF_data.columns)

# Sklearn for Principal Componenet Analysis
# Dims
m = DF_standard.shape[1]
K = 2

# PCA (How I tend to set it up)
Mod_PCA = decomposition.PCA(n_components=m)
DF_PCA = pd.DataFrame(Mod_PCA.fit_transform(DF_standard), 
                      columns=["PC%d" % k for k in range(1,m + 1)]).iloc[:,:K]
# Plot
fig, ax = plt.subplots()
ax.scatter(x=DF_PCA["PC1"], y=DF_PCA["PC2"], color="k")
ax.set_title("No Coloring")

enter image description here

Ideally, I wanted to do something like this:

# Color classes
cmap = {obsv_id:np.random.random() for obsv_id in DF_PCA.index}

# Plot



fig, ax = plt.subplots()
ax.scatter(x=DF_PCA["PC1"], y=DF_PCA["PC2"], color=[cmap[obsv_id] for obsv_id in DF_PCA.index])
ax.set_title("With Coloring")

# ValueError: to_rgba: Invalid rgba arg "0.2965562650640299"
# to_rgb: Invalid rgb arg "0.2965562650640299"
# cannot convert argument to rgb sequence

but it didn't like the continuous value.

I want to use a color palette like:

sns.palplot(sns.cubehelix_palette(8))

enter image description here

I also tried doing something like below, but it wouldn't make sense b/c it doesn't know which values I used in my cmap dictionary above:

ax.scatter(x=DF_PCA["PC1"], y=DF_PCA["PC2"],cmap=sns.cubehelix_palette(as_cmap=True)
O.rka
  • 29,847
  • 68
  • 194
  • 309

2 Answers2

50
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

x, y, z = np.random.rand(3, 100)
cmap = sns.cubehelix_palette(as_cmap=True)

f, ax = plt.subplots()
points = ax.scatter(x, y, c=z, s=50, cmap=cmap)
f.colorbar(points)

enter image description here

mwaskom
  • 46,693
  • 16
  • 125
  • 127
  • With the colorbar too! Yes. That was exactly what I was looking for. Thanks a lot ヾ(⌐■_■)ノ♪ @mwaskom – O.rka Sep 27 '16 at 23:28
  • 7
    This is great, thanks. Is there a sensible way to do this via seaborn's `hue` parameter? I tried it but the resulting legend makes little sense as the hue parameter seems to treat every value of the continuous variable as a categorical level. – user2428107 Mar 30 '17 at 04:58
6
from matplotlib.cm import ScalarMappable
from matplotlib.colors import Normalize


cmap = {obsv_id:np.random.random() for obsv_id in DF_PCA.index}
sm = ScalarMappable(norm=Normalize(vmin=min(list(cmap.values())), vmax=max(list(cmap.values()))), cmap=sns.cubehelix_palette(as_cmap=True))

# Plot
fig, ax = plt.subplots()
ax.scatter(x=DF_PCA["PC1"], y=DF_PCA["PC2"], color=[sm.to_rgba(cmap[obsv_id]) for obsv_id in DF_PCA.index])
ax.set_title("With Coloring")

enter image description here

O.rka
  • 29,847
  • 68
  • 194
  • 309
  • If anyone has an easier way where you don't have to import `ScalarMappable` and `Normalize` then I will definitely select that as correct. – O.rka Sep 27 '16 at 22:23