0

I have a snappy streaming table that reads json from a kafka topic. After some work, I've got this working, but ran into an issue when trying to map java.sql.Timestamp values from my SensorData object to the streaming table.

The error was happening in org.apache.spark.sql.catalyst.CatalystTypeConverters at line 318 in this method:

  private object StringConverter extends CatalystTypeConverter[Any, String, UTF8String] {
    override def toCatalystImpl(scalaValue: Any): UTF8String = scalaValue match {
      case str: String => UTF8String.fromString(str)
      case utf8: UTF8String => utf8
    }
    override def toScala(catalystValue: UTF8String): String =
      if (catalystValue == null) null else catalystValue.toString
    override def toScalaImpl(row: InternalRow, column: Int): String =
      row.getUTF8String(column).toString
  }

I ran through debug and the code was obviously expecting a string value here, but my sensorData object (and streaming table) sensor and collection times are Timestamps. Therefore it was complaining about not being able to convert the value.

Below is my SensorData class that I use to map the values from my incoming json message from Kafka. In my custom converter, I then map these values to my Seq[Row] in my toRows(...) method.

class SensorData {
    var sensor_id: String = _
    var metric: String = _
    var collection_time: java.sql.Timestamp = _
    var sensor_time: java.sql.Timestamp = _
//    var collection_time: String = _
//    var sensor_time: String = _
    var value: String = _
    var year_num: Int = _
    var month_num: Int = _
    var day_num: Int = _
    var hour_num: Int = _

}

Here is my streaming table:

snsc.sql(s"CREATE STREAM TABLE sensor_data_stream if not exists " +
        "(sensor_id string, " +
        "metric string, " +
        "collection_time TIMESTAMP, " +
        "value VARCHAR(128), " +
        "sensor_time TIMESTAMP, " +
        "year_num integer, " +
        "month_num integer, " +
        "day_num integer, " +
        "hour_num integer  " +
        ") " +
        "using kafka_stream " +
        "options (storagelevel 'MEMORY_AND_DISK_SER_2', " +
        "rowConverter 'org.me.streaming.sensor.test.converter.SensorConverter', " +
        "zkQuorum 'localhost:2181', " +
        " groupId 'sensorConsumer',  topics 'sensorTest:01')")

Now to get around this issue, I changed the datatypes in my SensorData object to be string as well as the column datatypes in my streaming table: i.e.:

    "collection_time string, " +
    "sensor_time string, " +

As a result, I was able to successfully stream the data from Kafka to my target column table after making this datatype change.

My question... I'm fairly new to the SnappyData/Streaming world and wanted to know if this is a bug (known/unknown), or is there a more elegant way to bind Timestamp datatypes to a streaming table?

******UPDATE PER RESPONSE********

Here is my Row converter:

class SensorConverter extends StreamToRowsConverter with Serializable {

  override def toRows(message: Any): Seq[Row] = {

  val mapper = new ObjectMapper()
  mapper.registerModule(DefaultScalaModule)    

  val sensor = mapper.readValue(message.toString(), classOf[SensorData])

    Seq(Row.fromSeq(Seq(
      sensor.sensor_id,
      sensor.metric,
      sensor.collection_time,
      sensor.value,
      sensor.sensor_time,
      sensor.year_num,
      sensor.month_num,
      sensor.day_num,
      sensor.hour_num)))
  }

}

I initially attempted converting a java object, but was running into issues decoding it (probably due to my current lack of knowledge of the APIs while I ramp up). I ended up just passing a json string through to Kafka.

I see in the example supplied @ https://github.com/SnappyDataInc/snappy-poc/blob/master/src/main/scala/io/snappydata/adanalytics/Codec.scala that I did not properly wrap the incoming Timestamp value (which is coming in as a long) with the java.sql.Timestamp call when building my Seq[Row]. I will give that a shot to see if that resolves my issue.

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
mike w
  • 131
  • 6

1 Answers1

1

Here is an example that you can reference for using timestamps with stream tables. https://github.com/SnappyDataInc/snappy-poc/blob/master/src/main/scala/io/snappydata/adanalytics/Codec.scala

Please check AdImpressionToRowsConverter#toRows implementation. In this case we are receiving long values (System.currentTimeMills) from kafka and converting into java.sql.Timestamp

Here is the stream table definition with timestamp type-
https://github.com/SnappyDataInc/snappy-poc/blob/master/src/main/scala/io/snappydata/adanalytics/SnappySQLLogAggregatorJob.scala

Can you please provide SensorConvertor#toRows implementation? Are you using corresponding decoder for your SensorData object?

Yogesh Mahajan
  • 241
  • 1
  • 4
  • thanks Yogesh. I've updated my question above. I see a difference in my implementation vs. the example you provided. I will try that. – mike w Aug 12 '16 at 18:02