0

I'm using Spark Streaming with Scala and I'm getting json records from kafka. I would like to parse it so I can get the values (dateTime and quality) and process.

Here is my code :

stream.foreachRDD(rdd => {
  rdd.collect().foreach(i =>
    println(msgParse(i.value()).quality)
  )
})

And I have this case class and my parse function :

case class diskQuality(datetime: String , quality : Double) extends  Serializable

def msgParse(value: String): diskQuality = {

  import org.json4s._
  import org.json4s.native.JsonMethods._

  implicit val formats = DefaultFormats

  val res = parse(value).extract[diskQuality]
  return res

}

I've added this dependency :

libraryDependencies += "org.json4s" % "json4s-native_2.10" % "3.2.4"

The records I'm receiving has this format :

"{\"datetime\":\"14-05-2017 14:18:30\",\"quality\":92.6}"

However I get this error :

Exception in thread "main" org.json4s.ParserUtil$ParseException: expected field or array Near: ,\"quality\":100.0}"

EDIT :

When I try to parse the following using the same function it works. But even if kafka messages come in the same format, it still gives the same error :

val test = "{\"datetime\":\"14-05-2017 14:18:30\",\"quality\":92.6}"

I'm using scalaVersion := "2.10.6" and json4s-native_2.10"

Any help would be really appreciated. Thank you for you time

AsmaaM
  • 63
  • 2
  • 3
  • 12
  • The first format is the correct one - "{\"datetime\":\"14-05-2017 14:18:30\",\"quality\":92.6}". And your code works with it. Can you please check what is the Scala version in build.sbt. Is it 2.10 as your org.json4s dependency? Also, you can log the value parameter of the msgParse function, in order to check what is the actual value of it. – Mykhailo Hodovaniuk Jun 15 '17 at 12:06
  • Thank you for your response, I edited my question and here is the value inside the msgParse when I print it : "{\"datetime\":\"24-04-2017 07:53:30\",\"quality\":100.0}" – AsmaaM Jun 15 '17 at 12:12
  • @AsmaaM if that is your console output - you have problems with quotes escaping, can you check what your producer sends to kafka? – ledniov Jun 15 '17 at 12:37
  • @MonteCristo when I check into kafka topic, I have something like this (with each line a message I guess. It's actually loaded from a file containing thoses records) : "[{\"datetime\":\"24-04-2017 07:53:30\",\"quality\":100.0}," "{\"datetime\":\"24-04-2017 08:14:30\",\"quality\":100.0}," "{\"datetime\":\"24-04-2017 08:21:30\",\"quality\":100.0}]" And in my code, I make some changes so each line can be in the right format with something like this for now : record = i.value().replace("[" , "").replace("]" , "").dropRight(2)+ '"' When I print "record" it gives the output I wrote – AsmaaM Jun 15 '17 at 12:44
  • @AsmaaM your goal is to have `{"datetime":"14-05-2017 14:18:30","quality":92.6}` format when you print it. The one that you posted is a representation of a string with escaped quotes. – ledniov Jun 15 '17 at 12:54
  • @ledniov You are right ! It was considering the escaped quotes as part of the string. Thank you sir !! Can you please write it as a solution so I can accept it ? – AsmaaM Jun 15 '17 at 13:14

1 Answers1

1

Looks like you have a problem on your Kafka Producer side, you have to end up with the following format by replacing escaped quotes:

{"datetime":"14-05-2017 14:18:30","quality":92.6}

It will give you correctly formatted JSON string.

ledniov
  • 2,302
  • 3
  • 21
  • 27