0

I am trying to display a scatterplot of a dataset that I made two dimensional with the PCA function from sklearn. My data is returned as followns:

array([[ -3.18592855e+04,  -2.13479310e+00],
       [ -3.29633003e+04,   1.40801796e+01],
       [ -3.25352942e+04,   7.36921088e+00],
...

I expected that the following code would work:

import pylab
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA

pca = PCA(n_components=2).fit(instances)
pca_2d = pca.transform(instances)

fig = plt.figure(figsize=(8,3))
plt.scatter(pca_2d[0],pca_2d[1])
plt.show()

But this returned an incorrect figure only displaying the first two values. What do I need to change to get this up and running?

hY8vVpf3tyR57Xib
  • 3,574
  • 8
  • 41
  • 86
  • 1
    Do this `plt.scatter(pca_2d[:, 0], pca_2d[:, 1 ])` , which plots first feature (column 0) on x axis and second on y-axis. – Gerges Jun 24 '17 at 21:34

1 Answers1

2

You gave 2 first rows instead of 2 columns of pca_2d to build your scatterplot.

Do:

import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
import numpy as np

instances = np.array([[ 1,  2],
                      [ 3,  4],
                      [ 5,  6]])
pca = PCA(n_components=2).fit(instances)
pca_2d = pca.transform(instances)

fig = plt.figure(figsize=(8,3))
plt.scatter(pca_2d[:,0],pca_2d[:,1])
plt.show()

Give well 3 points :

scatterplot

glegoux
  • 3,505
  • 15
  • 32