0

from a fresh download of apache spark 2.2.0 (and before on 2.1.1 on my other machine), I have a problem running the spark-shell: After starting, I dont have the spark, sc or sqlContext variables set. The winutils.exe exception showing is going away, if I donwload the windows hadoop, and set HADOOP_HOME, but still the spark context and sqlcontext are not there as expected.

System: Windows desktop Environemnt settings: Nothing changed (no HADOOP_HOME, no PATH settings for HADOOP winutils or SPARK_HOME etc. set, since it does not change the essential problem).

Here the output of the spark-shell command:

C:\Users\<snip>\Programme\spark-2.2.0-bin-hadoop2.7\bin>spark-shell.cmd
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/07/21 15:38:01 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
    at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
    at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
    at org.apache.hadoop.hive.conf.HiveConf$ConfVars.findHadoopBinary(HiveConf.java:2327)
    at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:365)
.
.
.
17/07/21 15:38:02 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java cl
asses where applicable
17/07/21 15:38:13 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple J
AR versions of the same plugin in the classpath. The URL "file:/C:/Users/sta130/Programme/spark-2.2.0-bin-hadoop2.7/jars
/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "
file:/C:/Users/sta130/Programme/spark-2.2.0-bin-hadoop2.7/bin/../jars/datanucleus-core-3.2.10.jar."
17/07/21 15:38:13 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have mu
ltiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Users/sta130/Programme/spark-2.2.0-bin-hadoop
2.7/bin/../jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin
 located at URL "file:/C:/Users/sta130/Programme/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-api-jdo-3.2.6.jar."
17/07/21 15:38:13 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont hav
e multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Users/sta130/Programme/spark-2.2.0-bin-ha
doop2.7/bin/../jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plug
in located at URL "file:/C:/Users/sta130/Programme/spark-2.2.0-bin-hadoop2.7/jars/datanucleus-rdbms-3.2.9.jar."
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:10
53)
  at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:130)
  at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:130)
  at scala.Option.getOrElse(Option.scala:121)
  at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:129)
  at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:126)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:938)
  at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:938)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
  at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
  at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
  at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
  at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:938)
  at org.apache.spark.repl.Main$.createSparkSession(Main.scala:97)
  ... 47 elided
Caused by: org.apache.spark.sql.AnalysisException: java.lang.RuntimeException: java.io.FileNotFoundException: File /tmp/
hive does not exist;
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:106)
  at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:193)
  at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:105)
  at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:93)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:39)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog$lzycompute(HiveSessionStateBuilder.scala:54)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:52)
  at org.apache.spark.sql.hive.HiveSessionStateBuilder.catalog(HiveSessionStateBuilder.scala:35)
  at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:289)
  at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:10
50)
  ... 61 more
Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: File /tmp/hive does not exist
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
  at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:191)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:362)
  at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:266)
  at org.apache.spark.sql.hive.HiveExternalCatalog.client$lzycompute(HiveExternalCatalog.scala:66)
  at org.apache.spark.sql.hive.HiveExternalCatalog.client(HiveExternalCatalog.scala:65)
  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply$mcZ$sp(HiveExternalCatalog.scala:194)

  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194)
  at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$databaseExists$1.apply(HiveExternalCatalog.scala:194)
  at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
  ... 70 more
Caused by: java.io.FileNotFoundException: File /tmp/hive does not exist
  at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
  at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
  at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
  at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
  at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:599)
  at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554)
  at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
  ... 84 more
<console>:14: error: not found: value spark
       import spark.implicits._
              ^
<console>:14: error: not found: value spark
       import spark.sql
              ^
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.2.0
      /_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_111)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

The problem here is the "error: not found: value spark". This means, I don't have the spark context up and subsequently the SQLContext does not initialize either. I can do this manually though, by importing the Classes, and using

 val spark = SparkSession.builder().getOrCreate()

but I was used to a bit more comfort, and actually searching for the solution for several hours before trying this manual step.
So, for the ones having the same trouble: Use the SparkSession, for the community here: Can anyone point me to a solution?

Thanks in advance!

Frischling
  • 2,100
  • 14
  • 34
  • did you install winutils.exe ?? – koiralo Jul 21 '17 at 14:04
  • I tried, but it didn't change the non-existance of the spark context in the end. It just made the first errormessage disappear... – Frischling Jul 21 '17 at 14:13
  • Unrelated, since I can make this exception go away, and the error of missing sparkContext and sqlContext is still there. – Frischling Jul 24 '17 at 07:30
  • I found this one here, which points to the correct direction, but does not provide a solution: https://stackoverflow.com/questions/39968707/spark-2-0-missing-spark-implicits – Frischling Jul 24 '17 at 14:02
  • https://stackoverflow.com/questions/36720067/spark-fail-in-windows-console16-error-not-found-value-sqlcontext - did you check this? I had the same problem. Following the answer in this post helped me to solve. – Gurupraveen Aug 10 '17 at 09:19
  • This is really weird, I set the HADOOP_HOME, and now it works, thanks! – Frischling Aug 14 '17 at 10:30

0 Answers0