0

I try make Apache Spark job on scala. I'm novice in Scala and and earlier use Pyspark. Have error when the job starts. Code:

object SparkRMSP_full {
  import org.apache.spark.sql.SparkSession

  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder
      .appName("parse_full_rmsp_job")
      .getOrCreate()

    val raw_data_df = spark.readStream
      .format("kafka")
      .option("kafka.bootstrap.servers", "10.1.24.111:9092")
      .option("subscribe", "dev.etl.fns.rmsp.raw-data")
      .load()

    println(raw_data_df.isStreaming)
    raw_data_df.printSchema
  }
}

spark-submit command:

spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10-assembly_2.11:2.1.0 --master local --num-executors 2 --executor-memory 2g --driver-memory 1g --executor-cores 2 "C:\tools\jar\streaming_spark.jar"

And I have error:

20/07/15 15:05:32 WARN SparkSubmit$$anon$2: Failed to load SparkRMSP_full.
java.lang.ClassNotFoundException: SparkRMSP_full

How I must declare the class correctly?

UPD:

build.sbt:

name := "streaming_spark"

version := "0.1"

scalaVersion := "2.11.12"

libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.1"
libraryDependencies += "org.apache.spark" %% "spark-streaming-kafka-0-10-assembly" % "2.3.1"

project sctructure on pastebin

1 Answers1

1

Change your spark-submit command like below & try again.

spark-submit \
  --packages org.apache.spark:spark-streaming-kafka-0-10-assembly_2.11:2.1.0 \
  --master local \
  --num-executors 2 \
  --executor-memory 2g \
  --driver-memory 1g \
  --executor-cores 2 \
  --class SparkRMSP_full \  # you might need add your fully qualified package name with class name
  "C:\tools\jar\streaming_spark.jar" 
Srinivas
  • 8,957
  • 2
  • 12
  • 26