4

Is there a way to join 2 tables adding a condition on columns between the 2 tables ?

Example :

case class TableA(pkA: Int, valueA: Int)
case class TableB(pkB: Int, valueB: Int)


val rddA = sc.cassandraTable[TableA]("ks", "tableA")
rddA.joinWithCassandraTable[TableB]("ks", "tableB").where("tableB.valueB > tableA.valueA")

Is there a way to send the where("tableB.valueB > tableA.valueA") instruction ? ("tableB.value" being a clustering column)

Gridou
  • 109
  • 7

1 Answers1

0

RDD.where() call just pass predicate to CQL. CQL is limited for fast and simple OLTP queries. More complicated queries could be done only with SparkSQL. For your case it could be something like this:

sqlContext.read.format("org.apache.spark.sql.cassandra")
    .options(Map( "table" -> "tableA", "keyspace"->"ks"))
    .load().registerTempTable("tableA")
sqlContext.read.format("org.apache.spark.sql.cassandra")
    .options(Map( "table" -> "tableB", "keyspace"->"ks"))
    .load().registerTempTable("tableB")
sqlContext.sql("select * from tableA join tableB on tableB.valueB > tableA.valueA").show
Artem Aliev
  • 1,362
  • 7
  • 12