1

I have a set of Excel format files which needs to be read from Spark(2.0.0) as and when an Excel file is loaded into a local directory. Scala version used here is 2.11.8.

I've tried using readstream method of SparkSession, but I'm not able to read in a streaming way. I'm able to read Excel files statically as:

val df = spark.read.format("com.crealytics.spark.excel").option("sheetName", "Data").option("useHeader", "true").load("Sample.xlsx")

Is there any other way of reading excel files in streaming way from a local directory?

Any answers would be helpful.

Thanks


Changes done:

val spark = SparkSession.builder().master("local[*]").config("spark.sql.warehouse.dir","file:///D:/pooja").appName("Spark SQL Example").getOrCreate()
spark.conf.set("spark.sql.streaming.schemaInference", true)
import spark.implicits._  
val dataFrame = spark.readStream.format("csv").option("inferSchema",true).option("header", true).load("file:///D:/pooja/sample.csv")
dataFrame.writeStream.format("console").start()
dataFrame.show()

Updated code:

val spark = SparkSession.builder().master("local[*]").appName("Spark SQL Example").getOrCreate()
spark.conf.set("spark.sql.streaming.schemaInference", true)
import spark.implicits._  
val df = spark.readStream.format("com.crealytics.spark.excel").option("header", true).load("file:///filepath/*.xlsx")
df.writeStream.format("memory").queryName("tab").start().awaitTermination()
val res = spark.sql("select * from tab")
res.show()

Error:

Exception in thread "main" java.lang.UnsupportedOperationException: Data source com.crealytics.spark.excel does not support streamed reading

Can anyone help me resolve this issue.

baitmbarek
  • 2,440
  • 4
  • 18
  • 26
Pooja Nayak
  • 182
  • 1
  • 4
  • 11
  • still a valid question; couldn't figure out how to readStream Excel files. Only works for CSVs for me – Chris Apr 05 '19 at 07:51

1 Answers1

0

For a streaming DataFrame you have to provide Schema and currently, DataStreamReader does not support option("inferSchema", true|false). You can set SQLConf setting spark.sql.streaming.schemaInference, which needs to be set at session level.

You can refer here

baitmbarek
  • 2,440
  • 4
  • 18
  • 26
Naman Agarwal
  • 614
  • 1
  • 8
  • 28
  • ,Thanks for your suggestion. – Pooja Nayak Sep 12 '17 at 05:33
  • But I need to read excel files in a directory as and when it is appearing in the directory.Is there any way using textFileStream or some other method. – Pooja Nayak Sep 12 '17 at 05:35
  • After setting the above SQLConf property you can do something like this val dataFrame = spark.readStream.format("csv").option("inferSchema","true").option("header","true").load("/path/to/your/folder/*.csv") you can replace the extension of csv with your excel file – Naman Agarwal Sep 12 '17 at 05:49
  • You can set SQLConf property like this : spark.sqlContext.setConf("spark.sql.streaming.schemaInference","true") where spark is your spark session in REPL. – Naman Agarwal Sep 12 '17 at 06:13
  • Thanks for your answer.I've tried with the above changes,but facing new issues as:Exception in thread "main" java.lang.NullPointerException at java.lang.ProcessBuilder.start(Unknown Source) – Pooja Nayak Sep 12 '17 at 07:01
  • I'm trying to read files from my D drive in my system.Is this error due to that? – Pooja Nayak Sep 12 '17 at 07:01
  • Can you please post what you tried with the above changes? – Naman Agarwal Sep 12 '17 at 07:08