SnappyData + Zeppelin + Kafka streaming - error while creating streaming table

Question

I'm trying to create SnappyData streaming table using Zeppelin. I have issue with stream table definition on argument 'rowConverter'

Zeppelin notebook is divided to a few paragraphs:

Paragraph 1:

import org.apache.spark.sql.Row
import org.apache.spark.sql.streaming.{SchemaDStream, StreamToRowsConverter}

class RowsConverter extends StreamToRowsConverter with Serializable {

override def toRows(message: Any): Seq[Row] = {
val log = message.asInstanceOf[String]
val fields = log.split(",")
val rows = Seq(Row.fromSeq(Seq(new java.sql.Timestamp(fields(0).toLong),
  fields(1),
  fields(2),
  fields(3),
  fields(4),
  fields(5).toDouble,
  fields(6)
)))

rows
}
}

Paragraph 2:

snsc.sql(
 "CREATE STREAM TABLE adImpressionStream if not exists ("sensor_id string, metric
metric string) using kafka_stream 
options (storagelevel 'MEMORY_AND_DISK_SER_2', 
rowConverter 'RowsConverter', 
zkQuorum 'localhost:2181',
groupId 'streamConsumer',  topics 'test'");"
)

First paragraph returns error:

error: not found: type StreamToRowsConverter
class RowsConverter extends StreamToRowsConverter with Serializable {
                               ^
<console>:13: error: not found: type Row
     override def toRows(message: Any): Seq[Row] = {
                                            ^
<console>:16: error: not found: value Row
       val rows = Seq(Row.fromSeq(Seq(new java.sql.Timestamp(fields(0).toLong),

Second paragraph:

java.lang.RuntimeException: Failed to load class : java.lang.ClassNotFoundException: RowsConverter

I have been trying to use default code from git:

 snsc.sql("create stream table streamTable (userId string, clickStreamLog string) " +
 "using kafka_stream options (" +
 "storagelevel 'MEMORY_AND_DISK_SER_2', " +
" rowConverter 'io.snappydata.app.streaming.KafkaStreamToRowsConverter' ," +   
 "kafkaParams 'zookeeper.connect->localhost:2181;auto.offset.reset->smallest;group.id->myGroupId', " +
 "topics 'test')")

but I have similar error:

java.lang.RuntimeException: Failed to load class : java.lang.ClassNotFoundException: io.snappydata.app.streaming.KafkaStreamToRowsConverter

Could you help me with this issue? Thank you a lot.

score 0 · Answer 1 · answered Oct 17 '17 at 02:31

0

You need to provide your application specific classes in the classpath. Please refer to the step of setting classpath here. Zeppelin will pick up classpath set in your spark-env.sh https://github.com/SnappyDataInc/snappy-poc#lets-get-this-going

answered Oct 17 '17 at 02:31

Yogesh Mahajan

241
1
4

Something works :D I did step by step action from https://github.com/SnappyDataInc/snappy-poc#lets-get-this-going. For now without Zeppelin, just checking if it is working. Streaming works - I can select data from tables, but I don't see any application on Spark. Also on webUI on port 4040 is unavailable for me. – Tomtom Oct 17 '17 at 07:05
can you try port 5050 ? – Yogesh Mahajan Oct 17 '17 at 18:20

score 0 · Answer 2 · answered Oct 23 '17 at 11:02

Add the snappydata interpreter to Apache Zeppelin as given here: https://snappydatainc.github.io/snappydata/howto/use_apache_zeppelin_with_snappydata/

This will enable running the Zeppelin in the lead so that the code is run in embedded mode. In particular you need to set the required jars using "-classpath" option in the cluster configuration.

SnappyData + Zeppelin + Kafka streaming - error while creating streaming table

2 Answers2