3

I am new to machine learning and I am trying to do unsupervised learning with k-means clustering (even if I read that k-means cannot work very well with categorical data). I encoded my categorical variables and tried to apply kernel PCA since I have a categorical feature (it is gender). I noticed that there are several values for the kernel parameter which are 'linear', 'poly', 'rbf', 'sigmoid', 'cosine' and 'precomputed'.

I searched on internet but I couldn't find any proper explanation on these. I could not be sure if the usage of kernel at PCA and SVM are the same either. Is there anyone who can explain what they are, when they should be used and/or how to choose the correct one for our dataset? Since we cannot visualize our dataset with more than 3 dimensions, how will we decide its shape to choose the correct parameter? Part of the code is below just to show where the parameter is used:

# Applying Kernel PCA 
from sklearn.decomposition import KernelPCA
kpca = KernelPCA(n_components = 2, kernel = 'linear')
X = kpca.fit_transform(X)

Thank you in advance.

Beg
  • 405
  • 1
  • 5
  • 18

1 Answers1

0

None of these predefined kernels supports mixed data either. They are vector kernels.

Linear kennel should give the same result as non-kernel PCA, just a lot slower.

There is not much relationship to SVM except the use of kernels. And kernels like rbf make much more sense when you can do hyperparameter optimization in a supervised classification task. Since choosing such parameters is hard, making good use of KernelPCA is difficult except for toy problems.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194