0

I have an Excel file as source file and I want to read data from Excel file and convert data in DataFrame using Databricks. I am new in Scala.

val df = spark.read.format("com.crealytics.spark.excel")
.option("location", "/FileStore/tables/Airline.xlsx") 
.option("useHeader","true")
.option("treatEmptyValuesAsNulls", "false") 
.option("inferSchema","false")
.option("addColorColumns", "false") 
.load("/FileStore/tables/Airline.xlsx") 
baitmbarek
  • 2,440
  • 4
  • 18
  • 26

1 Answers1

0

You can use the available Excel plugin:

libraryDependencies += "com.crealytics" %% "spark-excel" % "0.8.2"

Follow the samples from https://github.com/crealytics/spark-excel to build your Dataframe.

Emiliano Martinez
  • 4,073
  • 2
  • 9
  • 19
  • I have added maven library in my work space, I loaded my excel file in databricks HDFs and now i am trying to read excel file but it is showing error : java.io.FileNotFoundException: /FileStore/tables/Airline.xlsx (No such file or directory) . – Praveen Saini May 07 '19 at 10:52
  • Update your question and add your code. You can include the stacktrace to see what could be the problem. – Emiliano Martinez May 07 '19 at 11:13
  • code : val df = spark.read.format("com.crealytics.spark.excel") .option("location", "/FileStore/tables/Airline.xlsx") .option("useHeader", "true") .option("treatEmptyValuesAsNulls", "false") .option("inferSchema", "false") .option("addColorColumns", "false") .load("/FileStore/tables/Airline.xlsx") Error : java.io.FileNotFoundException: /FileStore/tables/Airline.xlsx (No such file or directory) but file is present. – Praveen Saini May 07 '19 at 11:43
  • Your spark executor is trying to load this file from a local storage... where is exactly your file stored? – Emiliano Martinez May 08 '19 at 08:04
  • my file is present at this location(/FileStore/tables/Airline.xlsx) but still getting error. File path is correct. – Praveen Saini May 08 '19 at 09:38
  • Yo should have a stack trace to see what type of file system provider is using your executor. I mean, if you are executing Spark in your computer with a master pointing to your localhost it would use local "POSIX". But if you performs spark submit to one cloud provider it will use another, for example a hdfs conector or s3. If you are using Databricks maybe you must put dbfs:// as protocol. – Emiliano Martinez May 08 '19 at 10:00
  • i got solution for this issue we should use latest maven library for same.libraryDependencies += "com.crealytics" %% "spark-excel" % "0.11.1" – Praveen Saini May 09 '19 at 07:55