Small question regarding prediction/forecast using SparkML and Naive Bayes please.
I have a very simple dataset, which is just time stamp, representing a day, and how many pancakes sold that day:
dataSetPancakes.show();
+----------+-----+
| time|label|
+----------+-----+
|1622505600| 1|
|1622592000| 0|
|1622678400| 3|
|1622764800| 1|
|1622851200| 1|
|1622937600| 1|
|1623024000| 1|
|1623110400| 2|
|1623196800| 2|
|1623283200| 0|
+----------+-----+
only showing top 10 rows"
Very simple, I just want to predict how much pancake will be sold tomorrow, the day after, etc...
Therefore, I tried the Naive Bayes model, following the tutorial here https://spark.apache.org/docs/latest/ml-classification-regression.html#naive-bayes, I wrote:
VectorAssembler vectorAssembler = new VectorAssembler().setInputCols(new String[]{"time"}).setOutputCol("features");
Dataset<Row> vectorData = vectorAssembler.transform(dataSetPancakes);
NaiveBayes naiveBayes = new NaiveBayes();
NaiveBayesModel model = naiveBayes.fit(vectorData);
Dataset<Row> predictions = model.transform(vectorData);
predictions.show();
model.predict(new DenseVector(new double[]{getTomorrowTimestamp()})));
I do even see results such as:
-RECORD 0--------------------------------------------------------------------------------------------------------------
time | 1622505600
label | 1
features | [1.6225056E9]
rawPrediction | [-0.9400072584914714,-1.0831081021321447,-1.702147310538368,-2.5494451709255714,-4.564348191467836]
probability | [0.39062499999999994,0.33854166666666663,0.18229166666666666,0.07812500000000001,0.01041666666666667]
prediction | 0.0
-RECORD 1--------------------------------------------------------------------------------------------------------------
time | 1622592000
label | 0
features | [1.622592E9]
rawPrediction | [-0.9400072584914714,-1.0831081021321447,-1.702147310538368,-2.5494451709255714,-4.564348191467836]
probability | [0.39062499999999994,0.33854166666666663,0.18229166666666666,0.07812500000000001,0.01041666666666667]
prediction | 0.0
But as for the prediction itself, it is always showing 0.0 for tomorrow, unfortunately.
May I ask what is the root cause of this issue please?
Thank you