0

I am trying to read XML file using SBT but i am facing issue when i compile it.

build.sbt

name:= "First Spark"
version:= "1.0"
organization := "in.goai"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0"
libraryDependencies += "com.databricks" % "spark-avro_2.10" % "2.0.1"
libraryDependencies += "org.scala-lang.modules" %% "scala-xml" % "1.0.2"
resolvers += Resolver.mavenLocal

.scala file

package in.goai.spark

import scala.xml._
import com.databricks.spark.xml
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkContext, SparkConf}

object SparkMeApp {
  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("First Spark")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    val fileName = args(0)
    val df = sqlContext.read.format("com.databricks.spark.xml").option("rowTag", "book").load("fileName")
    val selectedData = df.select("title", "price")
    val d = selectedData.show
    println(s"$d")

  }
}

when i compile it by giving "sbt package" it shows bellow error

[error] /home/hadoop/dev/first/src/main/scala/SparkMeApp.scala:4: object xml is not a member of package com.databricks.spark
[error] import com.databricks.spark.xml
[error]        ^
[error] one error found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 9 s, completed Sep 22, 2017 4:11:19 PM

Do i need to add any other jar files related to xml? please suggest and please provide me any link which gives information about jar files for different file formats

Jeffrey Chung
  • 19,319
  • 8
  • 34
  • 54

1 Answers1

1

Because you're using Scala 2.11 and Spark 2.0, in build.sbt, change your dependencies to the following:

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.0.0"
libraryDependencies += "com.databricks" %% "spark-avro" % "3.2.0"
libraryDependencies += "com.databricks" %% "spark-xml" % "0.4.1"
libraryDependencies += "org.scala-lang.modules" %% "scala-xml" % "1.0.6"
  1. Change the spark-avro version to 3.2.0: https://github.com/databricks/spark-avro#requirements
  2. Add "com.databricks" %% "spark-xml" % "0.4.1": https://github.com/databricks/spark-xml#scala-211
  3. Change the scala-xml version to 1.0.6, the current version for Scala 2.11: http://mvnrepository.com/artifact/org.scala-lang.modules/scala-xml_2.11

In your code, delete the following import statement:

import com.databricks.spark.xml

Note that your code doesn't actually use the spark-avro or scala-xml libraries. Remove those dependencies from your build.sbt (and the import scala.xml._ statement from your code) if you're not going to use them.

Jeffrey Chung
  • 19,319
  • 8
  • 34
  • 54
  • I am able to compile now :) but when i execute it it shows me below error any idea? Exception in thread "main" java.lang.NoSuchMethodError: org.apache.spark.sql.types.DecimalType$.Unlimited()Lorg/apache/spark/sql/types/DecimalType; –  Sep 22 '17 at 17:42
  • @ashoksrinivas: In your `sbt` console, run `reload` and `update`. – Jeffrey Chung Sep 22 '17 at 18:04
  • Sorry chunjef i am new to sbt and learning myself. i dont know how to console, reload, run sbt...i request you to guide me Thanks –  Sep 22 '17 at 18:21
  • i give " sbt package " folowed by this code to run " spark-submit --master "local[*]" --class in.goai.spark.SparkMeApp /home/hadoop/devo/first/target/scala-2.11/first-spark_2.11-1.0.jar scala.xml " –  Sep 22 '17 at 18:26
  • @ashoksrinivas: Try running `sbt clean reload update`. – Jeffrey Chung Sep 22 '17 at 19:19
  • Tried but no result :( –  Sep 22 '17 at 19:28
  • You should probably reward @ashoksrinivas the answer and start a new question for the `NoSuchMethodError: org.apache.spark.sql.types.DecimalType` issue. – ashawley Sep 23 '17 at 13:25