0

I'm using structured streaming and I'm trying to send my result into a kafka topic, named "results".

I get the following error:

'Append output mode not supported when there are streaming aggregations on streaming DataFrames/DataSets without watermark;;

Can anyone help?

query1 = prediction.writeStream.format("kafka")\
  .option("topic", "results")\
  .option("kafka.bootstrap.servers", "localhost:9092")\
  .option("checkpointLocation", "checkpoint")\
  .start()
query1.awaitTermination()

prediction schema is:

root
 |-- prediction: double (nullable = false)
 |-- count: long (nullable = false)

Am I missing something?

SaSJo
  • 61
  • 7
  • 2
    Does this answer your question? [spark structured streaming exception : Append output mode not supported without watermark](https://stackoverflow.com/questions/54117961/spark-structured-streaming-exception-append-output-mode-not-supported-without) – Alex Ott Mar 28 '20 at 16:31

1 Answers1

0

The error message gives a hint on what is missing: a watermark.

Watermarks are used to handle late incoming data when you are aggregating stream data. Details can be found in the Spark documentation for Structured Streaming.

It is important that withWatermark is used on the same column as the timestamp column used in the aggregate.

An example on how to use withWatermark is given in the Spark documentation:

words = ...  # streaming DataFrame of schema { timestamp: Timestamp, word: String }

# Group the data by window and word and compute the count of each group
windowedCounts = words \
    .withWatermark("timestamp", "10 minutes") \
    .groupBy(
        window(words.timestamp, "10 minutes", "5 minutes"),
        words.word) \
    .count()
Michael Heil
  • 16,250
  • 3
  • 42
  • 77