1

Data source org.apache.spark.sql.cassandra does not support streamed reading

   val spark = SparkSession
  .builder()
  .appName("SparkCassandraApp")
  .config("spark.cassandra.connection.host", "localhost")
  .config("spark.cassandra.connection.port", "9042")
  .config("spark.cassandra.auth.username", "xxxxx")
  .config("spark.cassandra.auth.password", "yyyyy")
  .master("local[*]")
  .getOrCreate();

val tableDf3 = spark.**readStream**
  .format("org.apache.spark.sql.cassandra")
  .options(Map( "table" -> "aaaaa", "keyspace" -> "bbbbb"))
  .load()
  .filter("deviceid='XYZ'")

tableDf3.show(10)
Alex Ott
  • 80,552
  • 8
  • 87
  • 132

1 Answers1

2

That's correct - Spark Cassandra Connector could be used only as streaming sink, not as streaming source.

If you want to get changes from Cassandra, then it's quite a complex task, depending on the version of Cassandra (does it implement CDC or not), and other factors.

For Spark, you can implement some kind of streaming by periodic re-read of the data, using the timestamp column to filter out the data you already read. You can find more information about that approach in the following answer.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132