-1

I am receiving the data from kafka in the form of

{"email":"test@example","firstname":"Example","lastname":"User"}

I want to access the email id and first name and want to compare it with data coming from cassandra in the form of :

CassandraRow{email: abc@xyz.com}
mplungjan
  • 169,008
  • 28
  • 173
  • 236
Anonymous
  • 29
  • 6
  • Can you expand your question - what should be the result of this comparison? How do you fetch data from Cassandra? What is the schema for Cassandra table? – Alex Ott Jul 06 '18 at 11:58
  • I want to compare the credentials (email id and name) and if they are same I want to send a message to kafka topic that they are equal. I am fetching data from cassandra table using the variable of SparkStreaming with the line val data1 = ssc.cassandraTable("test","login").select("email","name","lastname") .where("email=?","abc@xyz.com") val rddQueue = new Queue[RDD[com.datastax.spark.connector.CassandraRow]]() val dstream = ssc.queueStream(rddQueue) Also my cassandra table has entries for email id, name and last name – Anonymous Jul 07 '18 at 11:05
  • What is the primary key for table? – Alex Ott Jul 07 '18 at 11:22
  • email is the primary key – Anonymous Jul 07 '18 at 11:29
  • Please add the extra information you posted in your comment to the actual question – mplungjan Jul 18 '18 at 07:57

1 Answers1

0

You need to perform join with Cassandra using the joinWithCassandraTable function...

To be more effective, you may need to re-partition your RDD that you get from Kafka to match partitions inside Cassandra's table. The code could look like this:

val resultRdd = kafkaRDD.repartitionByCassandraReplica("ks","emails")
   .joinWithCassandraTable("ks","emails")

And after that you can analyze, if names matches, etc. And after join you should get only records for which there are emails in the Cassandra...

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • val lines=KafkaUtils.createDirectStream[String,String,StringDecoder,StringDecoder](ssc,kafkaParams,topics).map(_._2). I am getting this directStream from kafka so how should I re-partition this RDD to apply the operation joinwithcassandraTable and in what form is the value of the variable resultRdd – Anonymous Jul 07 '18 at 13:22