3

I am trying to read data from Kafka and Storing into Cassandra tables through Spark RDD's.

Getting error while compiling the code:

/root/cassandra-count/src/main/scala/KafkaSparkCassandra.scala:69: value split is not a member of (String, String)

[error]     val lines = messages.flatMap(line => line.split(',')).map(s => (s(0).toString, s(1).toDouble,s(2).toDouble,s(3).toDouble))
[error]                                               ^
[error] one error found

[error] (compile:compileIncremental) Compilation failed

Below code : when i run the code manually through interactive spark-shell it works fine, but while compiling code for spark-submit error comes.

// Create direct kafka stream with brokers and topics
val topicsSet = Set[String] (kafka_topic)
val kafkaParams = Map[String, String]("metadata.broker.list" -> kafka_broker)
val messages = KafkaUtils.createDirectStream[String, String, StringDecoder, StringDecoder]( ssc, kafkaParams, topicsSet)

// Create the processing logic
// Get the lines, split
val lines = messages.map(line => line.split(',')).map(s => (s(0).toString, s(1).toDouble,s(2).toDouble,s(3).toDouble))
lines.saveToCassandra("stream_poc", "US_city", SomeColumns("city_name", "jan_temp", "lat", "long")) 
halfer
  • 19,824
  • 17
  • 99
  • 186
Mitra
  • 31
  • 1
  • 4
  • @RameshMaharjan: please don't format proper nouns as code. Kafka and Cassandra merely need an initial capital, and that's it - they are not themselves code. However, things like `spark-shell` are OK, since the code formatting is appropriate for console I/O (presuming `spark-shell` is a command that is typed). – halfer Jun 27 '17 at 14:28

2 Answers2

3

All messages in kafka are keyed. The original Kafka stream, in this case messages, is a stream of tuples (key,value).

And as the compile error points out, there's no split method on tuples.

What we want to do here is:

messages.map{ case (key, value)  => value.split(','))} ...
maasg
  • 37,100
  • 11
  • 88
  • 115
2

KafkaUtils.createDirectStream returns a tuple of key and value (since messages in Kafka are optionally keyed). In your case it's of type (String, String). If you want to split the value, you have to first take it out:

val lines = 
  messages
   .map(line => line._2.split(','))
   .map(s => (s(0).toString, s(1).toDouble,s(2).toDouble,s(3).toDouble))

Or using partial function syntax:

val lines = 
  messages
   .map { case (_, value) => value.split(',') }
   .map(s => (s(0).toString, s(1).toDouble,s(2).toDouble,s(3).toDouble))  
Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321