0

i want to initialize a matrix using data in flatMap , this is my data:

-4,0,1.0 ### horrible . not-work install dozen scanner umax ofcourse . tech-support everytime call . fresh install work error . crummy product crummy tech-support crummy experience .
2,1,1.0 ### scanner run . grant product run windows . live fact driver windows lose performance . setup program alert support promptly quits . amazon . website product package requirement listing compatible windows .
1,2,1.0 ### conversion kit spare battery total better stick versionand radio blow nimh charger battery . combination operation size nimh battery . motorola kit . rechargable battery available flashlight camera game toy .
-4,3,1.0 ### recieive part autowinder catch keep place sudden break . hold listen music winder wind . extremely frustrated fix pull little hard snap half . flush drain .

and this is my code:

val spark_context = new SparkContext(conf)
 val data = spark_context.textFile(Input)
 val Gama=DenseMatrix.zeros[Double](4,2)
 var gmmainit = data.flatMap(line => {
   val tuple = line.split("###")
   val ss = tuple(0)
   val re = """^(-?\d+)\s*,\s*(\d+)\s*,\s*(\d+).*$""".r
   val re(n1, n2, n3) = ss // pattern match and extract values

   if (n1.toInt >= 0) {
     Gama(n2.toInt, 0) += 1
   }
   if (n1.toInt < 0) {
     Gama(n2.toInt, 1) += 1
   }
 })

 println(Gama)

but it doesn't change Gama matrix,

how can i modify my code to solve this problem?

hadiye
  • 3
  • 2

2 Answers2

1

You can't modify variables in your distributed functions. Well, you can, but the variable is only modified in THAT process. Remember that spark is distributed. So, you need to return a value that can be flattened (I don't know DenseMatrix well enough to say the exact need here). You might be able to create a custom accumulator to accomplish this though, if it can be associative and commutative.

Justin Pihony
  • 66,056
  • 18
  • 147
  • 180
0

First of all your code won't even compile. If you take a look at the flatMap signature:

flatMap[U](f: T => TraversableOnce[U])

you'll see it maps from T to TraversableOnce[U]. Since update method of DenseMatrix returns Unit function you use is of type String => Unit and Unit is not TraversableOnce.

Moreover, as already explained by Justin, each partition gets its own local copy of the variables referenced in a closure and only that copy is modified.

One way can you solve this problem is something like this:

val gmmainit = data.mapPartitions(iter => {
  val re = """^(-?\d+)\s*,\s*(\d+)\s*,\s*(\d+).*$""".r
  val gama = DenseMatrix.zeros[Double](4,2)
  iter.foreach{
    case re(n1, n2, n3) =>  gama(n2.toInt, if(n1.toInt >= 0) 0 else 1) += 1
    case _ =>
  }
  Iterator(gama)
}).reduce(_ + _)
Community
  • 1
  • 1
zero323
  • 322,348
  • 103
  • 959
  • 935