1

I already have a Deep Learning model.I am trying to run scoring on streaming data. For this I am reading data from kafka using spark structured streaming api.When I try to convert the received dataset to H20Frame I am getting below error:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Queries with streaming sources must be executed with writeStream.start();

Code Sample

Dataset<Row> testData=sparkSession.readStream().schema(testSchema).format("kafka").option("kafka.bootstrap.servers", "localhost:9042").option("subscribe", "topicName").load();
H2OFrame h2oTestFrame = h2oContext.asH2OFrame(testData.toDF(), "test_frame");

Is there any example that explains sparkling water using spark structured streaming with streaming source?

1 Answers1

1

Is there any example that explains sparkling water using spark structured streaming with streaming source?

There isn't. Generic purpose transformations, including conversion to RDDs and external formats, are not supported in Structured Streaming.

  • I would suggest looking at the demo from the last H2O World conference available at: https://github.com/h2oai/h2o-tutorials/blob/master/h2o-world-2017/pysparkling/AmazonFineFoodPipeline.ipynb It shows how the model can be trained off-line but the predictions are the on-line in the Streaming application. – Jakub Háva Apr 06 '18 at 10:37