PCA() got an unexpected keyword argument 'k'

Question

I am trying t perform pca from a spark application using PySpark API on a python script. I doing This way:

pca = PCA(k=3, inputCol="features", outputCol="pcaFeatures")
PCAmodel = pca.fit(data)

when I run those two code line in the pyspark shell it work and return good results, but in an application script, I am getting the type of error:

PCA() got an unexpected keyword argument 'k'

PS: In both case I am using Spark 2.2.0

where is the problem? why it does work in the PySpark shell and not for the application?

score 3 · Answer 1 · answered Nov 15 '17 at 15:58

3

You probably imported from ml in one case:

from pyspark.ml.feature import PCA

and mllib in the other:

from pyspark.mllib.feature import PCA

answered Nov 15 '17 at 15:58

user8946307

31
1

no, this not my case, for both case I'm using `from pyspark.ml.feature import PCA` – Nov 15 '17 at 16:04

desertnaut · Answer 2 · 2017-11-15T17:48:17.477

Are you sure you have not also imported PCA from scikit-learn, after you imported it from PySpark in your application script?

spark.version
# u'2.2.0'

from pyspark.ml.feature import PCA
from sklearn.decomposition import PCA

# PySpark syntax with scikit-learn PCA function
pca = PCA(k=3, inputCol="features", outputCol="pcaFeatures") 
# Error:  
TypeError: __init__() got an unexpected keyword argument 'k'

Reversing the order of imports will not produce the error (not shown).

score 1 · Answer 3 · answered Nov 16 '17 at 22:43

1

Try renaming your classes:

from pyspark.ml.feature import PCA as PCAML
from sklearn.decomposition import PCA as PCASK

pca_ml = PCAML(k=3, inputCol="features", outputCol="pcaFeatures")

There should be no confusion, then, which one you call.

answered Nov 16 '17 at 22:43

MisterJT

412
4
15

PCA() got an unexpected keyword argument 'k'

3 Answers3