What is a supported streaming datasource to persist result?

Question

I'm trying to use the new streamed writing feature with spark 2.0.1-SNAPSHOT. which output datasource are actually supported to persist the results? I was able to display the output on console with something like this:

Dataset<Event> testData = sqlContext
    .readStream()
    .schema(schema)
    .format("json")
    .load("s3://......")
    .as(encoder);

Dataset<Row> result = testData.filter("eventType = 'playerLoad'")
    .groupBy(col("country"), window(col("timestamp"), "10 minutes"))
    .agg(sum("startupTime").alias("tot"));

result.writeStream()  
      .outputMode(OutputMode.Complete())
      .format("console")
      .start()
      .awaitTermination();

but if I try to change .format("console") to "json" or "jdbc" I have the message: Data source xxx does not support streamed writing.

score 0 · Answer 1 · answered Jul 10 '16 at 12:30

Currently (as of Jul 9, 2016) there are four available streaming sinks:

ConsoleSink for console format.
FileStreamSink for parquet format.
ForeachSink used for foreach operator.
MemorySink as memory.

You can create your own streaming format implementing StreamSinkProvider.

What is a supported streaming datasource to persist result?

1 Answers1