IllegalArgumentException: u'requirement failed: Invalid initial capacity' in Spark on Google DataProc

Question

I am currently trying to run a ml decision tree on a large dataset (30 mio observations, 13 variables) in Spark 2.0.0 on Google DataProc. When I execute:

labelIndexer = StringIndexer(inputCol="Target", outputCol="indexedLabel").fit(data)

I receive the following error:

IllegalArgumentException: u'requirement failed: Invalid initial capacity'

I do not find a lot information about this error on the internet. Can somebody please explain what the problem is and how I can resolve it?

score 1 · Answer 1 · answered Aug 27 '16 at 08:00

1

The error was due to the fact that the input dataframe (data) was defined but empty.

answered Aug 27 '16 at 08:00

Stijn

459
2
8
18

IllegalArgumentException: u'requirement failed: Invalid initial capacity' in Spark on Google DataProc

1 Answers1