I am trying to run my first program in Spark with scala. Trying to read a csv file and display.
Code:
import org.apache.spark.sql.SparkSession
import org.apache.spark._
import java.io._
import org.apache.spark.SparkContext._
import org.apache.log4j._
object df extends App{
val spark=SparkSession.builder().getOrCreate()
val drf=spark.read.csv("C:/Users/admin/Desktop/scala-datasets/Scala-and-
Spark-Bootcamp-master/Spark DataFrames/CitiGroup2006_2008")
drf.head(5)
}
Getting the following error:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/04/29 23:10:53 INFO SparkContext: Running Spark version 2.1.0
17/04/29 23:10:56 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
17/04/29 23:10:57 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your
configuration at org.apache.spark.SparkContext.<init>
(SparkContext.scala:379)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2313)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:868)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$6.apply(SparkSession.scala:860)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:860)
at df$.delayedEndpoint$df$1(df.scala:11)
at df$delayedInit$body.apply(df.scala:9)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.App$$anonfun$main$1.apply(App.scala:76)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
at scala.App$class.main(App.scala:76)
at df$.main(df.scala:9)
at df.main(df.scala)
Any suggestions would be helpful