2

Starting from an example I was trying to do LinearRegression. The problem is that I got the wrong result. As interceptor I should have: 2.2.

I tried to add .optimizer.setStepSize(0.1) found on another post, but still get a weird result. Suggestion?

This is the set of data

1,2
2,4
3,5
4,4
5,5

Code:

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.mllib.regression.LabeledPoint
import org.apache.spark.mllib.regression.LinearRegressionModel
import org.apache.spark.mllib.regression.LinearRegressionWithSGD
import org.apache.spark.mllib.linalg.Vectors

object linearReg {
  def main(args: Array[String]) {
    StreamingExamples.setStreamingLogLevels()
    val sparkConf = new SparkConf().setAppName("linearReg").setMaster("local")
    val sc=new SparkContext(sparkConf)
    // Load and parse the data
    val data = sc.textFile("/home/daniele/dati.data")
    val parsedData = data.map { line =>
      val parts = line.split(',')
      LabeledPoint(parts(0).toDouble, Vectors.dense(Array(1.0)++parts(1).split(' ').map(_.toDouble)))
    }.cache()
    parsedData.collect().foreach(println)
    // Building the model
    val numIterations = 1000
    val model = LinearRegressionWithSGD.train(parsedData, numIterations)
    println("Interceptor:"+model.intercept)
    // Evaluate model on training examples and compute training error
    val valuesAndPreds = parsedData.map { point =>
      val prediction = model.predict(point.features)
      (point.label, prediction)
    }
    valuesAndPreds.collect().foreach(println)
    val MSE = valuesAndPreds.map { case (v, p) => math.pow((v - p), 2) }.mean()
    println("training Mean Squared Error = " + MSE)

    // Save and load model
    model.save(sc, "myModelPath")
    val sameModel = LinearRegressionModel.load(sc, "myModelPath")
  }
}

Result:

weights: [-4.062601003207354E25], intercept: -9.484399253945647E24

Update -Used .train method -Added the 1.0 as addendum for the intercept. Data appear in this way with 1.0 addendum

zero323
  • 322,348
  • 103
  • 959
  • 935

1 Answers1

2

You are using run which means that the data you are passing in is being interpreted as "configured parameters" and not features to be regressed.

The docs contain good examples of training then running your model:

//note the "train" instead of "run"
val numIterations = 1000
val model =  LinearRegressionWithSGD.train(parsedData, numIterations)

The result is a more accurate weight:

scala> model.weights
res4: org.apache.spark.mllib.linalg.Vector = [0.7674418604651163]

If you want to add an intercept just place a 1.0 value as a feature in your dense Vector. Modify your example code:

...
LabeledPoint(Parts(0).toDouble, Vectors.dense(Array(1.0) ++ parts(1).split(' ').map(_.toDouble)))
...

The first feature is then your intercept.

Ramón J Romero y Vigil
  • 17,373
  • 7
  • 77
  • 125