I'm using sklearn.decomposition.PCA
to pre-process some training data for a machine learning model. There is 247 data points with 4095 dimensions, imported from a csv
file using pandas
. I then scale the data
training_data = StandardScaler().fit_transform(training[:,1:4096])
before calling the PCA
algorithm to obtain the variance for each dimension,
pca = PCA(n_components)
pca.fit(training_data)
.
The output is a vector of length 247, but it should have length 4095 so that I can work out the variance of each dimension, not the variance of each data point.
My code looks like:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
test = np.array(pd.read_csv("testing.csv", sep=','))
training = np.array(pd.read_csv("training.csv", sep=','))
# ID Number = [0]
# features = [1:4096]
training_data = StandardScaler().fit_transform(training[:,1:4096])
test_data = StandardScaler().fit_transform(test[:,1:4096])
training_labels = training[:,4609]
pca = PCA()
pca.fit(training_data)
pca_variance = pca.explained_variance_.
I have tried taking the transpose of training_data
, but this didn't change the output. I have also tried changing n_components
in the argument of the PCA
function, but it is insistent that there can only be 247 dimensions.
This may be a stupid question, but I'm very new to this sort of data processing. Thank you.