Hi I am trying to fit a MultiLayerPerceptron with PySpark 2.4.3 Machine Learning Library. But every time I try to fit the algorithm I get the following error:
Py4JJavaError: An error occurred while calling o4105.fit. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 784.0 failed 4 times, most recent failure: Lost task 0.3 in stage 784.0 (TID 11663, hdpdncwy87013.dpp.acxiom.net, executor 1): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$org$apache$spark$ml$feature$OneHotEncoderModel$$encoder$1: (double, int) => struct,values:array>) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
df = sqlContext.read.format("csv").options(header='true', sep=",", inferschema='true').load(location)
exclude = ["Target"]
inputs = [column for column in df.columns if (column not in exclude)]
vectorAssembler = VectorAssembler(inputCols=inputs, outputCol='Features')
vdf = vectorAssembler.transform(df)
vdf = vdf.select(['Features'] + exclude)
# Feature Scaling
scaler = MinMaxScaler(inputCol="Features", outputCol="scaledFeatures")
scalerModel = scaler.fit(vdf)
scaledData = scalerModel.transform(vdf)
# train-test split
splits = scaledData.randomSplit([0.7, 0.3], seed=2020)
train_df = splits[0]
test_df = splits[1]
layers = [len(inputs), 3, 3, 3, 5]
mlpc = MultilayerPerceptronClassifier(labelCol="Target", featuresCol="scaledFeatures", layers=layers,
blockSize=128, stepSize=0.03, seed=2020, maxIter=1000)
model = mlpc.fit(train_df)
Do you have an idea? Thank you in advance. Number of inputs 1902, number of classes to predict 5.