Questions tagged [spark-excel]

Spark Excel tag is related to reading Excel files (xlsx) through Apache Spark.

There exists multiple libraries helping developers to read Excel files through Apache Spark. The most common ones are :

45 questions
0
votes
1 answer

HDFS Excel Rows got decreased when running the spark job on Yarn

When running the same job in local (IntelliJ IDEA) the output counts are fine (For eg -55). But When submitted it on Yarn using spark-submit, Getting only few rows out of it (Rows -12). spark2-submit --master yarn --deploy-mode client…
Zdev
  • 36
  • 6
0
votes
0 answers

How to pass a dataframe read from excel to another variable in spark-scala?

I have a dataframe var cache :DataFrame = _. As an initial run i have given, cache = existingDF, the existingdf is read from an excel using crealytics.spark.excel. but in the subsequent run, the existingDF will get another updated excel file, it…
ss301
  • 514
  • 9
  • 22
0
votes
2 answers

Read excel files with apache spark

(new to apache spark) I tried to create a small Scala Spark app which read excel files and insert data into database, but I have some errors which are occured due of different library versions (I think). Scala v2.12 Spark v3.0 Spark-Excel…
AlleXyS
  • 2,476
  • 2
  • 17
  • 37
0
votes
1 answer

how to load excel data in already created hive table in orc format

I need to load data in already created hive table in orc format. ie I need to read data from excel sheet, create data frame and then load it into hive tables in orc format.
0
votes
1 answer

Read Excel in Spark Error :InputStream of class ZipArchiveInputStream is not implementing InputStreamStatistics

I am trying to read excel files from COS via spark , like this def readExcelData(filePath: String, spark: SparkSession): DataFrame = spark.read .format("com.crealytics.spark.excel") .option("path", filePath) …
Ayan Biswas
  • 1,641
  • 9
  • 39
  • 66
0
votes
1 answer

scala.MatchError while trying to read excel file via com.crealytics.spark.excel

I am trying to read an excel file via com.crealytics.spark.excel. But I am facing the following error , while trying to run my code: scala.MatchError: Map(treatemptyvaluesasnulls -> true, location -> a.xlsx, useheader -> true, inferschema -> False,…
Ayan Biswas
  • 1,641
  • 9
  • 39
  • 66
0
votes
1 answer

How to create data frame if Excel file is my source file in databricks

I have an Excel file as source file and I want to read data from Excel file and convert data in DataFrame using Databricks. I am new in Scala. val df = spark.read.format("com.crealytics.spark.excel") .option("location",…
0
votes
0 answers

Not able to read .xlsx files using spark-excel library

I'm trying to read a .xlsx file and convert it to a Dataframe using spark-excel. But when I try to read the file it's throwing a java.lang.IllegalArgumentException: InputStream of class class …
Avinash
  • 13
  • 1
  • 5
0
votes
1 answer

Construct a dataframe from excel using scala

I am looking for way to construct the dataframe from an excel file in spark using scala? I referred below SO post and tried doing an operation for an excel sheet attached. How to construct Dataframe from a Excel (xls,xlsx) file in Scala…
Dwarrior
  • 687
  • 2
  • 10
  • 26
0
votes
1 answer

Spark excel: reading excel file with multi line header throw an exception: Method threw 'scala.MatchError' exception

I'm using spark-excel to read excel files, the problem is whenever I use a file with multilines header, the QueryExecution of the dataset throw an exception Method threw 'scala.MatchError' exception. Cannot evaluate…
Abdennacer Lachiheb
  • 4,388
  • 7
  • 30
  • 61
0
votes
1 answer

gettting exception while reading duplicate column name excel file using sparkexcel library. How to overcome this issue

I am using spark-excel(com.crealytics.spark.excel) library to read excel file. If no duplicate column in excel file, the library working fine. If any duplicate column name occurs in excel file, throwing below exception. How to overcome this…
Nanaji
  • 31
  • 6
0
votes
2 answers

Loading Data from an excel File using Spark Java Excel

I want to load data from an Excel file in HDFS using Spark Session 2.2. Here is bellow my Java code and the exception I got. Dataset df = session.read(). format("com.crealytics.spark.excel"). …
OOvic
  • 47
  • 1
  • 6
0
votes
2 answers

s3 path printed incorrectly by spark excel reader

I am trying to read an excel sheet from Amazon S3 and here is the code snippet. But it fails saying file doesn't exist though its there , I checked there is a slash (/) missing from the path. println(path) val data = sqlContext.read. …
Garipaso
  • 391
  • 2
  • 8
  • 22
0
votes
2 answers

how to save spark data frame to excel format?

I would like to save a Spark DataFrame into Excel. I have done it for csv by saving csv file in each node and appending it in the server using the DataBricks spark-csv library. I Don't know how to do it for Excel. Somebody please suggest and idea.
Manu Jose
  • 131
  • 2
  • 18
-1
votes
1 answer

how to read a whole directory of XLSX with apache spark scala?

I have to read a whole directory of xlsx files, and I need to load all the directory with Apache Spark using Scala. Actually I'm using this dependency : "com.crealytics" %% "spark-excel" % "0.12.3", and I don't know how to load all.
javier_orta
  • 457
  • 4
  • 15
1 2
3