I have the following issue when training a Naive Bayes classifier. I'm getting this error:
File "/home/juande/Desktop/spark-1.3.0-bin-hadoop2.4/python/pyspark/mllib /classification.py", line 372, in train
return NaiveBayesModel(labels.toArray(), pi.toArray(), numpy.array(theta))
ValueError: invalid __array_struct__
When training the model using this line
dataframe = dataframe.map(lambda x: LabeledPoint(sections_to_number[x[4]], tf.transform([x[0], x[1], x[2], x[3]])))
model = NaiveBayes.train(dataframe, 1.0)
Where sections_to_number
is a dictionary that maps the value from some strings to float numbers, for example sports -> 0, weather -> 1 and so on.
However, if I train it using a number instead of using the mapping sections_to_number, then I do not get any error.
dataframe = dataframe.map(lambda x: LabeledPoint(10.0, tf.transform([x[0], x[1], x[2], x[3]])))
model = NaiveBayes.train(dataframe, 1.0)
Am I missing something? Thanks