Questions tagged [spark-excel]

Spark Excel tag is related to reading Excel files (xlsx) through Apache Spark.

There exists multiple libraries helping developers to read Excel files through Apache Spark. The most common ones are :

Crealytics Spark-Excel
- Github: https://github.com/crealytics/spark-excel
- Maven: https://mvnrepository.com/artifact/com.crealytics/spark-excel
Spark HadoopOffice
- Github: https://github.com/ZuInnoTe/spark-hadoopoffice-ds/
- Package: https://spark-packages.org/package/ZuInnoTe/spark-hadoopoffice-ds

45 questions

votes

1 answer

HDFS Excel Rows got decreased when running the spark job on Yarn

When running the same job in local (IntelliJ IDEA) the output counts are fine (For eg -55). But When submitted it on Yarn using spark-submit, Getting only few rows out of it (Rows -12). spark2-submit --master yarn --deploy-mode client…

asked Jun 23 '21 at 09:19

Zdev

votes

0 answers

How to pass a dataframe read from excel to another variable in spark-scala?

I have a dataframe var cache :DataFrame = _. As an initial run i have given, cache = existingDF, the existingdf is read from an excel using crealytics.spark.excel. but in the subsequent run, the existingDF will get another updated excel file, it…

scala dataframe apache-spark spark-excel

asked Sep 14 '20 at 14:16

ss301

votes

2 answers

Read excel files with apache spark

(new to apache spark) I tried to create a small Scala Spark app which read excel files and insert data into database, but I have some errors which are occured due of different library versions (I think). Scala v2.12 Spark v3.0 Spark-Excel…

scala apache-spark apache-spark-sql spark-excel

asked Jul 08 '20 at 09:16

AlleXyS

2,476
2
17
37

votes

1 answer

how to load excel data in already created hive table in orc format

I need to load data in already created hive table in orc format. ie I need to read data from excel sheet, create data frame and then load it into hive tables in orc format.

scala apache-spark hadoop hive spark-excel

asked Jan 17 '20 at 15:02

Chetan Mane

votes

1 answer

Read Excel in Spark Error :InputStream of class ZipArchiveInputStream is not implementing InputStreamStatistics

I am trying to read excel files from COS via spark , like this def readExcelData(filePath: String, spark: SparkSession): DataFrame = spark.read .format("com.crealytics.spark.excel") .option("path", filePath) …

excel apache-spark spark-excel

asked Oct 03 '19 at 11:05

Ayan Biswas

1,641
9
39
66

votes

1 answer

scala.MatchError while trying to read excel file via com.crealytics.spark.excel

I am trying to read an excel file via com.crealytics.spark.excel. But I am facing the following error , while trying to run my code: scala.MatchError: Map(treatemptyvaluesasnulls -> true, location -> a.xlsx, useheader -> true, inferschema -> False,…

excel apache-spark spark-excel

asked Sep 16 '19 at 19:05

Ayan Biswas

1,641
9
39
66

votes

1 answer

How to create data frame if Excel file is my source file in databricks

I have an Excel file as source file and I want to read data from Excel file and convert data in DataFrame using Databricks. I am new in Scala. val df = spark.read.format("com.crealytics.spark.excel") .option("location",…

scala apache-spark spark-excel

asked May 07 '19 at 09:29

Praveen Saini

votes

0 answers

Not able to read .xlsx files using spark-excel library

I'm trying to read a .xlsx file and convert it to a Dataframe using spark-excel. But when I try to read the file it's throwing a java.lang.IllegalArgumentException: InputStream of class class …

excel scala apache-spark apache-poi spark-excel

asked Jan 25 '19 at 18:14

Avinash

votes

1 answer

Construct a dataframe from excel using scala

I am looking for way to construct the dataframe from an excel file in spark using scala? I referred below SO post and tried doing an operation for an excel sheet attached. How to construct Dataframe from a Excel (xls,xlsx) file in Scala…

excel scala apache-spark apache-spark-sql spark-excel

asked Jun 11 '18 at 01:04

Dwarrior

votes

1 answer

Spark excel: reading excel file with multi line header throw an exception: Method threw 'scala.MatchError' exception

I'm using spark-excel to read excel files, the problem is whenever I use a file with multilines header, the QueryExecution of the dataset throw an exception Method threw 'scala.MatchError' exception. Cannot evaluate…

java apache-spark apache-spark-dataset spark-excel

asked May 28 '18 at 09:06

Abdennacer Lachiheb

4,388
7
30
61

votes

1 answer

gettting exception while reading duplicate column name excel file using sparkexcel library. How to overcome this issue

I am using spark-excel(com.crealytics.spark.excel) library to read excel file. If no duplicate column in excel file, the library working fine. If any duplicate column name occurs in excel file, throwing below exception. How to overcome this…

java apache-spark spark-excel

asked May 19 '18 at 06:53

Nanaji

votes

2 answers

Loading Data from an excel File using Spark Java Excel

I want to load data from an Excel file in HDFS using Spark Session 2.2. Here is bellow my Java code and the exception I got. Dataset df = session.read(). format("com.crealytics.spark.excel"). …

java excel apache-spark hdfs spark-excel

asked May 02 '18 at 17:40

OOvic

votes

2 answers

s3 path printed incorrectly by spark excel reader

I am trying to read an excel sheet from Amazon S3 and here is the code snippet. But it fails saying file doesn't exist though its there , I checked there is a slash (/) missing from the path. println(path) val data = sqlContext.read. …

excel scala apache-spark spark-excel

asked Jun 08 '17 at 06:57

Garipaso

votes

2 answers

how to save spark data frame to excel format?

I would like to save a Spark DataFrame into Excel. I have done it for csv by saving csv file in each node and appending it in the server using the DataBricks spark-csv library. I Don't know how to do it for Excel. Somebody please suggest and idea.

apache-spark spark-excel

asked Apr 11 '17 at 07:41

Manu Jose

-1

votes

1 answer

how to read a whole directory of XLSX with apache spark scala?

I have to read a whole directory of xlsx files, and I need to load all the directory with Apache Spark using Scala. Actually I'm using this dependency : "com.crealytics" %% "spark-excel" % "0.12.3", and I don't know how to load all.

apache-spark apache-spark-sql spark-excel

asked Oct 18 '19 at 22:42

javier_orta

Prev 1 2