2

I have a list of 10 elements and I chose 5 features to create my input and converted the input list to an array; then I applied pca with the number of components=2:

idx = {0, 2, 3, 7, 8}
Input = array([Input[x] for x in idx]).reshape((1, -1)) 
print (Input.shape) # prints (1, 5)

pca = PCA(n_components=2).fit(Input) 
Input = pca.transform(Input) 
print (Input.shape) #prints (1, 1)

Why after pca with n_components=2 my input shape is (1,1) and not (1,2) ?

mkrieger1
  • 19,194
  • 5
  • 54
  • 65
april
  • 131
  • 3
  • 11

1 Answers1

1

Yes, its the expected behaviour.

As per the documentation for PCA:

actual n_components = min(n_samples, specified n_components)

Here you have n_samples = 1 (as you show in Input.shape) so the actual n_components returned by PCA is 1.

Vivek Kumar
  • 35,217
  • 8
  • 109
  • 132
  • thanks for the reply, what do you suggest if I have a saved svm model that I want to apply on this sample after pca and that saved svm model should accept 2 features (I could not use pipe line to save my pca and svm models together since I had to add extra features between these two models (after pca and before svm)) – april Apr 25 '18 at 01:50
  • @april I am not able to understand you properly. Please make a new question or edit this with more details as to what you want to do. In your data , do you have only 1 sample? If not, you will not get this error. You can use FeatureUnion and Pipeline to combine the features after PCA and before SVM – Vivek Kumar Apr 25 '18 at 05:20
  • It does not make sense to use a different PCA between training and testing, in particular if your test set as one sample. If you change your projection, the features will not be the same, and your model will not work correctly – Eolmar Apr 25 '18 at 05:20
  • @Eolmar I am not suggesting to use a different PCA here. And during `transform()` single sample can be provided without any issue. The issue here is about `fit()`. If you provide only a single sample during training, only a single component will be returned. – Vivek Kumar Apr 25 '18 at 05:30
  • @VivekKumar, I agree with you. This is more than a code issue, as doing a PCA on one observation is strange. – Eolmar Apr 25 '18 at 06:04
  • @Eolmar So what did you mean by `"It does not make sense to use a different PCA between training and testing"`. – Vivek Kumar Apr 25 '18 at 06:07
  • @VivekKumar I was answering to the first comment by April. sorry for not making it clear. Your suggestion of using a Pipeline is probably the correct way to do it – Eolmar Apr 25 '18 at 06:13
  • thanks for the comments and suggestion I'll try feature union. Also yes this single sample is a user data for my system. – april Apr 25 '18 at 09:06
  • @april No, what I am saying is, do you only have a single sample for training the model? – Vivek Kumar Apr 25 '18 at 09:07
  • no, for training process I have more than 100 data 20% dedicated to test – april Apr 25 '18 at 09:11
  • @april Ok, post some samples in the question and I can show an example of FeatureUnion – Vivek Kumar Apr 25 '18 at 09:15
  • 1
    thank you for your help ill work on my code and will create a new question for that (will link it here later) – april Apr 25 '18 at 09:21
  • @VivekKumar I created a new topic about this if you are interested too look at: https://stackoverflow.com/questions/50101117/typeerror-grid-seach thanks – april Apr 30 '18 at 13:02