I am a newbie with respect to spark and have just started some serious work with it.
We are building a platform where we are receiving temperature data from stations at a particular timestamp. So the data is getting posted to RabbitMQ as a csv e.g
WD1,12.3,15-10-12T12:23:45
WD2,12.4,15-10-12T12:24:45
WD1,12.3,15-10-12T12:25:45
WD1,22.3,15-10-12T12:26:45
We are dumping the data into Cassandra and we wanted to use spark for building a model out of it . What we aim from the model is to find sharp temperature raise that happens within a short time frame window. As an example , in the data there is a 10 degree rise in temperature within 1 minute .I was thinking of using Linear Regression in order to build the model . However the spark Linear regression model seems to only accept double values and after reading the documentation i understand that the equation for finding weights is more in the form of
y = a1x1+a2x2+a3x3
than
y = mx+c
So spark can give weights and the intercept values. But I am not sure I can use this model . Just to satisfy my curiosity , I did try to build the model out of this data. But all of the predictions were horrendous and I think the data as well. I tried to build a matrix of temperature vs timestamp and the predictions were pretty incorrect.
My questions are the following
- Is the way that I am building the model completely wrong. If so , How do i rectify it?
- If not Linear Regression Model , Is there any other model mechanism that can indicate this sharp rise ?
My Sample code:
JavaRDD<LabeledPoint> parsedData = cassandraRowsRDD.map(new Function<String, LabeledPoint>() {
public LabeledPoint call(String line) {
String[] parts = line.split(",");
double value = Double.parseDouble(parts[1]);
System.out.println("Y = " + Double.parseDouble(parts[0]) + " :: TIMESTAMP = " + value);
return new LabeledPoint(Double.parseDouble(parts[0]), Vectors.dense(value));
}
});
parsedData.cache();
StandardScaler scaler = new StandardScaler();
DataFrame dataFrame = sqlContext.createDataFrame(parsedData, LabeledPoint.class);
System.out.println(dataFrame.count());
dataFrame.printSchema();
LinearRegression lr = new LinearRegression().setMaxIter(10).setRegParam(0.3).setElasticNetParam(0.8);
// Fit the model
LinearRegressionModel lrModel = lr.fit(dataFrame);
System.out.println("Weights: " + lrModel.weights() + " Intercept: " + lrModel.intercept());