Why does StringIndexer has no outputCols?

Question

I am using Apache Zeppelin. My anaconda version is conda 4.8.4. and my spark version is:

%spark2.pyspark
spark.version
u'2.3.1.3.0.1.0-187'

When I run my code, it throws followed error:

Exception AttributeError: "'StringIndexer' object has no attribute '_java_obj'" in <object repr() failed> ignored
Fail to execute line 4: indexerFeatures = StringIndexer(inputCols=catColumns, outputCols=catIndexedColumns, handleInvalid="keep")
Traceback (most recent call last):
  File "/tmp/zeppelin_pyspark-66369397479549554.py", line 375, in <module>
    exec(code, _zcUserQueryNameSpace)
  File "<stdin>", line 4, in <module>
  File "/usr/hdp/current/spark2-client/python/lib/pyspark.zip/pyspark/__init__.py", line 105, in wrapper
    return func(self, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'outputCols'

I ran the same code in Databricks and everything worked fine. I also checked the import for the StringIndexer with the help() function and it didn't included the outputCols argument.

`inputCols` and `outputCols` parameters are available for spark 3.x — AdibP, Dec 30 '21 at 01:05

过过招 · Accepted Answer · 2021-12-30T01:08:39.433

2

It should be outputCol, not outputCols.

For spark 2.3.1, you can refer to: https://spark.apache.org/docs/2.3.1/api/python/pyspark.ml.html#pyspark.ml.feature.StringIndexer

class pyspark.ml.feature.StringIndexer(inputCol=None, outputCol=None, handleInvalid='error', stringOrderType='frequencyDesc')

edited Dec 30 '21 at 01:08

answered Dec 30 '21 at 01:00

过过招

3,722
2
4
11

Why does StringIndexer has no outputCols?

1 Answers1