0

I am creating a spark job server, which connects to cassandra. After getting the records i want to perform a simple group by and sum on it. I am able to retreive the data, I could not print the output. I have tried google on for hours and have posted in cassandra google groups as well. My current code is as below and i am getting error at collect.

 override def runJob(sc: SparkContext, config: Config): Any = {
//sc.cassandraTable("store", "transaction").select("terminalid","transdate","storeid","amountpaid").toArray().foreach (println)
// Printing of each record is successful
val rdd = sc.cassandraTable("POSDATA", "transaction").select("terminalid","transdate","storeid","amountpaid")
val map1 = rdd.map ( x => (x.getInt(0), x.getInt(1),x.getDate(2))->x.getDouble(3) ).reduceByKey((x,y)=>x+y)
println(map1)
// output is ShuffledRDD[3] at reduceByKey at Daily.scala:34
map1.collect
//map1.ccollectAsMap().map(println(_))
//Throwing error java.lang.ClassNotFoundException: transaction.Daily$$anonfun$2

}

Nideesh
  • 13
  • 4
  • Do you have spark cassandra connector runtime libraries on worker nodes? – noorul May 06 '16 at 12:18
  • It's useful to keep in mind, that Spark is lazy - transformations are not applied till you call final action (like collect, take, foreach, etc). So, println does not force any computation, it just calls toString on RDD. So you can not be sure, that data was retrieved – Vitalii Kotliarenko May 06 '16 at 17:53
  • @ noorul i have cassandra connect driver. The below line is printing records " sc.cassandraTable("store", "transaction").select("terminalid","transdate","storeid","amountpaid").toArray().foreach (println)" – Nideesh May 07 '16 at 11:47

2 Answers2

0

Your map1 is a RDD. You can try the following:

map1.foreach(r => println(r))
Cecil Pang
  • 58
  • 1
  • 4
0

Spark does lazy evaluation on rdd. So try some action

   map1.take(10).foreach(println)
Knight71
  • 2,927
  • 5
  • 37
  • 63