I am using the Datastax Cassandra java driver to write to Cassandra from spark workers. Code snippet
rdd.foreachPartition(record => {
val cluster = SimpleApp.connect_cluster(Spark.cassandraip)
val session = cluster.connect()
record.foreach { case (bin_key: (Int, Int), kpi_map_seq: Iterable[Map[String, String]]) => {
kpi_map_seq.foreach { kpi_map: Map[String, String] => {
update_tables(session, bin_key, kpi_map)
}
}
}
} //record.foreach
session.close()
cluster.close()
}
While reading I am using the spark cassandra connector (which uses the same driver internally I assume)
val bin_table = javaFunctions(Spark.sc).cassandraTable("keyspace", "bin_1")
.select("bin").where("cell = ?", cellname) // assuming this will run on worker nodes
println(s"get_bins_for_cell Count of Bins for Cell $cellname is ", cell_bin_table.count())
return bin_table
Doing this each at a time does not cause any problem. Doing it together is throwing this stack trace.
My main goal is not to do the write or read directly from the Spark driver program. Still it seems that it has to do something with the context; two context getting used ?
16/07/06 06:21:29 WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 22, euca-10-254-179-202.eucalyptus.internal): java.io.IOException: org.apache.spark.SparkException: Failed to get broadcast_5_piece0 of broadcast_5
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1222)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:165)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:88)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)