Questions tagged [spark-excel]

Spark Excel tag is related to reading Excel files (xlsx) through Apache Spark.

There exists multiple libraries helping developers to read Excel files through Apache Spark. The most common ones are :

Crealytics Spark-Excel
- Github: https://github.com/crealytics/spark-excel
- Maven: https://mvnrepository.com/artifact/com.crealytics/spark-excel
Spark HadoopOffice
- Github: https://github.com/ZuInnoTe/spark-hadoopoffice-ds/
- Package: https://spark-packages.org/package/ZuInnoTe/spark-hadoopoffice-ds

45 questions

vote

1 answer

Reading a Excel file in Spark with an integer column

I have a group of Excel sheets, that I am trying to read via spark through com.crealytics.spark.excel package. In my excel sheet I have a column Survey ID that contains integer IDs. When I read the data through spark I see the values are converted…

asked Aug 05 '19 at 16:18

Ayan Biswas

1,641
9
39
66

vote

0 answers

writing dataset to excel is giving ChangeFileModeByMask error (5): Access is denied ERROR

I have created a dataset of type Row as below, Dataset databaseDs = sparkSession.createDataFrame(dbStatus, Status.class); I want to convert this to Excel sheet so I used below code for Excel conversion databaseDs.write() …

java apache-spark spark-excel

asked Jul 18 '18 at 13:33

theAverageCoder

vote

4 answers

Spark : skip top rows with spark-excel

I have an excel file with damaged rows on the top (3 first rows) which needs to be skipped, I'm using spark-excel library to read the excel file, on their github there no such functionality, so is there a way to achieve this? This my…

excel apache-spark apache-spark-sql spark-excel

asked May 05 '18 at 16:02

Abdennacer Lachiheb

4,388
7
30
61

vote

1 answer

Write dataframe to multiple tabs in a excel sheet using Spark

I have been using Spark-excel (https://github.com/crealytics/spark-excel) to write the output to a single sheet of an Excel sheet. However, I am unable to write the output to different sheets (tabs). Can anyone suggest any alternative? Thanks, Sai

scala apache-spark apache-spark-sql spark-excel

asked Feb 23 '18 at 19:58

Bharath

vote

2 answers

Convert Excel file to csv in Spark 1.X

Is there a tool to convert Excel files into csv using Spark 1.X ? got this issue when executing this tuto https://github.com/ZuInnoTe/hadoopoffice/wiki/Read-Excel-document-using-Spark-1.x Exception in thread "main" java.lang.NoClassDefFoundError:…

excel scala apache-spark apache-spark-1.6 spark-excel

asked Dec 13 '17 at 15:41

Ronald Segan

vote

1 answer

Reading excel files in a streaming fashion in spark 2.0.0

I have a set of Excel format files which needs to be read from Spark(2.0.0) as and when an Excel file is loaded into a local directory. Scala version used here is 2.11.8. I've tried using readstream method of SparkSession, but I'm not able to read…

excel apache-spark spark-streaming spark-excel

asked Sep 12 '17 at 05:16

Pooja Nayak

votes

0 answers

Crealytics Spark Excel Package import issues

I am trying to process Excel files using Spark. I have created a session and added the dependent jar and package in configuration. Spark Version = 3.1.1 Scala version = 2.12 I have added this jar = "spark-excel_2.12-3.1.1_0.18.7". But still I am…

python apache-spark pyspark apache-poi spark-excel

asked Jul 10 '23 at 12:52

user3104078

votes

1 answer

java.lang.NullPointerException while reading specific sheet from xlsx using org.zuinnote.spark.office.excel

We are trying to read one specific sheet from Excel (.xlsx with 3 sheets) using org.zuinnote.spark.office.excel into spark dataframe. We are using MSExcelLowFootprintParser parser. code used is val hadoopConf = new Configuration() val spark =…

apache-poi spark3 spark-excel

asked Jun 15 '23 at 12:42

Ashish Mishra

votes

0 answers

How to define spark-excel schema without specific order

I'm reading in an XLS using spark-excel. My program has a base format (same columns and headers), but between different users they may have additional columns that are not required for my program. Is there a way to define the Schema, or at least,…

scala apache-spark spark-excel

asked May 21 '23 at 19:22

l Steveo l

votes

0 answers

Getting MatchError while constructing dataframe from a excel xlsx file in scala-spark

used the following below function but still getting match error def readExcel(file: String): DataFrame = sqlContext.read .format("com.crealytics.spark.excel") .option("location", file) .option("useHeader", "true") …

dataframe apache-spark apache-spark-sql spark-excel

asked Mar 20 '23 at 06:15

Deepak Kumar

votes

0 answers

Pyspark driver error while reading Excel files

I am reading excel files using pyspark.All the dataframes are stored inside a list. while merging all the data frames I am getting out of memory error. The code looks below. def union_spark_dfs(*dfs): return reduce(lambda df1, df2:…

pyspark apache-spark-sql openpyxl spark-excel

asked Mar 01 '23 at 13:28

code_bug

votes

0 answers

spark-excel_2.10 package creating conflict with maven on Mac

This java package work fine on my Windows machine, but when I run this on my Mac machine it create conflict with maven; both machine have same version of maven. Can you tell me why it's happening, and can you give me the solution to this…

java maven apache-poi spark-excel

asked Jan 28 '23 at 13:13

Ramzan Niazi

votes

0 answers

Unable to write dataframe as xslx format in spark scala

df .coalesce(1) .write .format("com.crealytics.spark.excel") .option("useHeader", "true") .option("header", "true") .mode(SaveMode.Append) .save(s"s3://$bucket/$etlFolderPrefix/a.xlsx") ERROR [main] glue.ProcessLauncher…

scala apache-spark noclassdeffounderror nosuchmethoderror spark-excel

asked Jan 04 '23 at 10:22

sangeet singh

votes

1 answer

i'm unable to perform skipFirstRows parameter while reading excel in pyspark - python

Note: we should not use pandas.read_excel() while reading excel in my case. we only need to use spark-excel jar installed in our cluster. my main point is. we have skip few lines in the excel sheet while reading the file by using any logic or any…

python apache-spark pyspark spark-excel

asked Sep 11 '22 at 14:56

Goutham Boine

votes

1 answer

read percentage values in spark

I have a xlsx file which has a single column ; percentage 30% 40% 50% -10% 0.00% 0% 0.10% 110% 99.99% 99.98% -99.99% -99.98% when i read this using Apache-Spark out put i get is, |percentage| +----------+ | 0.3| | 0.4| | 0.5| | …

apache-spark apache-spark-sql spark-excel

asked Dec 01 '21 at 13:08

Sarvesh Singh

Prev 1

3 Next