0

I have the following two functions that do not compile due to 3 errors:

  1. RegressionMetrics: Cannot resolve constructor
  2. _.nonEmpty: Type mismatch, expected ((Double,Double))=>Boolean, actual ((Double,Double))=>Any enter image description here

  3. reduce(_+_): Cannot resolve symbol +.

Code:

import org.apache.spark.mllib.evaluation.RegressionMetrics

//..

def getRMSE (rdd: RDD): Double = {
    val metrics = new RegressionMetrics(rdd)
    metrics.rootMeanSquaredError
}

def calculateRMSE(output: DStream[(Double, Double)]): Double = {
    output.filter(_.nonEmpty).map(getRMSE).reduce(_+_)
}

test("Test1") {
// do some data preprocessing
// call the function calculateRMSE
}

Any idea how to fix these errors?

P.S: The strange thing is that when I put val metrics = new RegressionMetrics(rdd) inside the test is compiles without any problem.

UPDATE:

I was able to solve issue #1 by adding (Double,Double) to RDD:

  def getRMSE(rdd : RDD[(Double, Double)]) : Double = {
    val metrics = new RegressionMetrics(rdd)
    metrics.rootMeanSquaredError
  }
Klue
  • 1,317
  • 5
  • 22
  • 43
  • Split this line ` output.filter(_.nonEmpty).map(getRMSE).reduce(_+_)` into a sequence of `val` assignments (`val filtered = output.filter(...); val mapped = filtered.map(...); val reduced = mapped.reduce(...)` and look at the types. I think you'll find they're not what you are expecting. – The Archetypal Paul May 03 '16 at 10:17

1 Answers1

0

reduce(func): Return a new DStream of single-element RDDs by aggregating the elements in each RDD of the source DStream using a function func (which takes two arguments and returns one). The function should be associative so that it can be computed in parallel.

So, right signature for calculateRMSE should be:

def calculateRMSE(output: DStream[(Double, Double)]): DStream[Double]
Vitalii Kotliarenko
  • 2,947
  • 18
  • 26
  • I tried what you say. The problem is the same. See my update, I posted the screenshot. – Klue May 03 '16 at 12:29
  • you can't call nonEmpty on tuple of double, is what your screenshot saying. what do you want to achieve by this filtering? – Vitalii Kotliarenko May 03 '16 at 12:35
  • My question refers to the answer in this thread: http://stackoverflow.com/questions/36984923/how-to-solve-type-mismatch-issue-expected-double-actual-unit As I understood by using `filter` I can filter out empty RDDs... – Klue May 03 '16 at 13:10