2

I'm trying to use the new streamed writing feature with spark 2.0.1-SNAPSHOT. which output datasource are actually supported to persist the results? I was able to display the output on console with something like this:

Dataset<Event> testData = sqlContext
    .readStream()
    .schema(schema)
    .format("json")
    .load("s3://......")
    .as(encoder);

Dataset<Row> result = testData.filter("eventType = 'playerLoad'")
    .groupBy(col("country"), window(col("timestamp"), "10 minutes"))
    .agg(sum("startupTime").alias("tot"));

result.writeStream()  
      .outputMode(OutputMode.Complete())
      .format("console")
      .start()
      .awaitTermination();

but if I try to change .format("console") to "json" or "jdbc" I have the message: Data source xxx does not support streamed writing.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420
Paolo
  • 21
  • 1

1 Answers1

0

Currently (as of Jul 9, 2016) there are four available streaming sinks:

  • ConsoleSink for console format.
  • FileStreamSink for parquet format.
  • ForeachSink used for foreach operator.
  • MemorySink as memory.

You can create your own streaming format implementing StreamSinkProvider.

Jacek Laskowski
  • 72,696
  • 27
  • 242
  • 420