Hoping someone can help.
I'm trying to stream some data and keep the current state of IoT devices into Kudu.
I'm currently using a ForeachWriter for the sink - sadly, it only works when there is a single row, if there is more than one row it hangs and doesn't write any data to the Kudu table.
Has anyone seen this before?
Code:
df.select("...DATA....." )
.as[IoTState]
.groupByKey(_.assetId)
.mapGroupsWithState(GroupStateTimeout.NoTimeout)(updateIoTState)
.writeStream
.foreach(new ForeachWriter[IoTState]
{
override def open(partitionId: Long, version: Long): Boolean = {
true
}
override def process(value: IoTState): Unit = {
val valueDF: DataFrame = Seq(value).toDF(
"assetId"
, "eventDateTimeInUTC"
, "gpsLatitudeInDegrees"
, "gpsLongitudeInDegrees"
)
kuduContext1.upsertRows(valueDF, conf.kuduTable)
}
override def close(errorOrNull: Throwable): Unit = {
}
})
.outputMode("update")
.trigger(Trigger.ProcessingTime("2 seconds"))
.start()
.awaitTermination()