0

Hoping someone can help.

I'm trying to stream some data and keep the current state of IoT devices into Kudu.

I'm currently using a ForeachWriter for the sink - sadly, it only works when there is a single row, if there is more than one row it hangs and doesn't write any data to the Kudu table.

Has anyone seen this before?

Code:

    df.select("...DATA....." )
          .as[IoTState]
          .groupByKey(_.assetId)
          .mapGroupsWithState(GroupStateTimeout.NoTimeout)(updateIoTState)
          .writeStream
          .foreach(new ForeachWriter[IoTState]
                  {
                    override def open(partitionId: Long, version: Long): Boolean = {
                                        true

                                      }
                                      override def process(value: IoTState): Unit = {

                                        val valueDF: DataFrame = Seq(value).toDF(
                                          "assetId"
                                          , "eventDateTimeInUTC"
                                          , "gpsLatitudeInDegrees"
                                          , "gpsLongitudeInDegrees"
                                        )
kuduContext1.upsertRows(valueDF, conf.kuduTable)
                                    }
                                  override def close(errorOrNull: Throwable): Unit = {
                                  }
                                })
          .outputMode("update")
          .trigger(Trigger.ProcessingTime("2 seconds"))
          .start()
          .awaitTermination()
tk421
  • 5,775
  • 6
  • 23
  • 34
  • @ Irianna can you mention from where you are creating the data frame – sai pradeep kumar kotha Apr 05 '18 at 21:45
  • Hi @saipradeepkumarkotha Ive tried this using both json files and Kafka as a topic. I've noticed that if i set the number of shuffle paritions to be less than the number of cores, it works... but that seems a bit of a bad answer. – Irianna Apr 06 '18 at 15:13

0 Answers0