0

I have been following this tutorial for the iris flower dataset.

I am trying to create a 2D pca for my own data but i cant figure out what to change etc.

This is my code:

data_df = pd.DataFrame.from_csv("fvectors.csv")
    data_df = data_df.reindex(np.random.permutation(data_df.index))
    X = np.array(data_df[features].values)

    data_df2 = pd.DataFrame.from_csv("fvectors.csv")
    y = np.array(data_df2[features1].replace("Circle",0).replace("Triangle",1)
             .replace("Square",2).replace("Parallelogram",3)
             .replace("Rectangle",4).replace("Pentagon",5)
             .replace("Seal",6).values.tolist())
    target_names = data_df.Label

    plt.figure()
    colors = ['navy', 'turquoise', 'darkorange']
    lw = 2
    pca = PCA(n_components=2)
    X_r = pca.fit(X).transform(X)

    for color, i, target_name in zip(colors, [0, 1, 2], target_names):
            plt.scatter(X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=.8, lw=lw,
                    label=target_name)

    plt.legend(loc='best', shadow=False, scatterpoints=1)
    plt.title('PCA of 2D Shape Dataset')

My fvectors.csv is this:

My labels.csv is this:

I am getting this error:

plt.scatter(X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=.8, lw=lw, label=target_name)
IndexError: too many indices for array

I know I am probably doing something stupid. Does anyone know how to fix this error and the rest of my code for my data so it works?

Thanks

Thom Elliott
  • 197
  • 3
  • 6
  • 18
  • Check the contents of data_df. Seems like there is no column `target_names `. – pbreach Mar 28 '17 at 18:02
  • @pbreach I Changed it to "label" which is the column heading then i get this plt.scatter(X_r[y == i, 0], X_r[y == i, 1], color=color, alpha=.8, lw=lw, label=target_name) IndexError: too many indices for array – Thom Elliott Mar 28 '17 at 18:18
  • Ok great as for this error, it seems like `X_r` is a 1d array starting from `X`. What is `features `? It seems like it needs to be a list of column names but might only be one. – pbreach Mar 28 '17 at 18:24
  • @pbreach features = ["Number of Sides", "Standard Deviation of Number of Sides/Perimeter", "Standard Deviation of the Angles", "Largest Angle"] – Thom Elliott Mar 28 '17 at 18:32
  • 1
    Hmm okay well you could try changing `X = np.array(data_df[features].values) ` to simply `X = data_df[features].values` which will already be an array. What do you get from doing `X.shape` and `X_r.shape`? – pbreach Mar 28 '17 at 18:45
  • @pbreach I still get the same error – Thom Elliott Mar 28 '17 at 18:48
  • In that case I will refer you to the answer of a similar question [here](http://stackoverflow.com/a/28042958/2593236). Best of luck – pbreach Mar 28 '17 at 18:55
  • As @pbreach said, show your X.shape and X_r.shape – Vivek Kumar Mar 29 '17 at 06:11
  • @VivekKumar `X.shape = (7, 4)`, `X_r = (7, 2)` – Thom Elliott Mar 29 '17 at 15:01

0 Answers0