0

I am currently trying to run a ml decision tree on a large dataset (30 mio observations, 13 variables) in Spark 2.0.0 on Google DataProc. When I execute:

labelIndexer = StringIndexer(inputCol="Target", outputCol="indexedLabel").fit(data)

I receive the following error:

IllegalArgumentException: u'requirement failed: Invalid initial capacity'

I do not find a lot information about this error on the internet. Can somebody please explain what the problem is and how I can resolve it?

Stijn
  • 459
  • 2
  • 8
  • 18

1 Answers1

1

The error was due to the fact that the input dataframe (data) was defined but empty.

Stijn
  • 459
  • 2
  • 8
  • 18