1

I'm writing a streaming application which requires me to add or substract a incoming value to a pre-existing value in my cassandra table. I have seen the batch documentation of cql, but have not figured out a solution to my problem yet. here is a small example for more insite:

pre-exisiting table

table:{
word:{'hello',
neg:{-0.5},
neu:{0.3},
pos:{0.2},
comp:{0.7}
}
}

incoming value:

word:{'hello',
neu:{0.4}
}

here I need to add the 0.4 and the 0.3 and re-insert in the table.
Alex Ott
  • 80,552
  • 8
  • 87
  • 132
Kush Singh
  • 157
  • 3
  • 11
  • does the table updated only from Spark job, or could be also updated from outside? What are you using - Spark Streaming, or Spark Structured Streaming? What version of Spark Cassandra Connector are you using? – Alex Ott Jun 23 '20 at 10:19
  • the update can only be possible from the spark job. Im using spark-cassandra connector 2.5.0, yes, I'm using spark streaming – Kush Singh Jun 23 '20 at 12:03

1 Answers1

1

You can implement addition by joining (on the primary key of the rows) your input data with data in Cassandra using the leftJoinWithCassandraTable, then perform addition of the input data with fetched data, and writing data back to Cassandra.

Something like this (adopted from my own code):

// this case class is matching to Cassandra table & input data...
case class Data(....)
val data = ...data_that_you_received_and_casted_to_Data_case_class...
val joined = data.leftJoinWithCassandraTable[Data]("ks", "table")
// perform update of existing values, and prepare new data
val summed = joined.map({ case (n: Data, c: Option[Data]) =>
 c match {
   // if there is no data in Cassandra, just return input data
   case None => Data(n)
   // there is data in Cassandra, do the sum
   case Some(s) =>
     Data(..., n.neu + s.neu)}
})
// and write updated/new values
summed.saveToCassandra("ks", "table")

P.S. I would recommend to use Spark Structured Streaming that is directly supported by Spark Cassandra Connector 2.5.0, together with so-called direct join for data frames. Your code would be simpler & potentially more optimized when using dataframes.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132