Questions tagged [spark-excel]

Spark Excel tag is related to reading Excel files (xlsx) through Apache Spark.

There exists multiple libraries helping developers to read Excel files through Apache Spark. The most common ones are :

45 questions
1
vote
1 answer

Reading a Excel file in Spark with an integer column

I have a group of Excel sheets, that I am trying to read via spark through com.crealytics.spark.excel package. In my excel sheet I have a column Survey ID that contains integer IDs. When I read the data through spark I see the values are converted…
Ayan Biswas
  • 1,641
  • 9
  • 39
  • 66
1
vote
0 answers

writing dataset to excel is giving ChangeFileModeByMask error (5): Access is denied ERROR

I have created a dataset of type Row as below, Dataset databaseDs = sparkSession.createDataFrame(dbStatus, Status.class); I want to convert this to Excel sheet so I used below code for Excel conversion databaseDs.write() …
1
vote
4 answers

Spark : skip top rows with spark-excel

I have an excel file with damaged rows on the top (3 first rows) which needs to be skipped, I'm using spark-excel library to read the excel file, on their github there no such functionality, so is there a way to achieve this? This my…
Abdennacer Lachiheb
  • 4,388
  • 7
  • 30
  • 61
1
vote
1 answer

Write dataframe to multiple tabs in a excel sheet using Spark

I have been using Spark-excel (https://github.com/crealytics/spark-excel) to write the output to a single sheet of an Excel sheet. However, I am unable to write the output to different sheets (tabs). Can anyone suggest any alternative? Thanks, Sai
Bharath
  • 467
  • 2
  • 8
  • 20
1
vote
2 answers

Convert Excel file to csv in Spark 1.X

Is there a tool to convert Excel files into csv using Spark 1.X ? got this issue when executing this tuto https://github.com/ZuInnoTe/hadoopoffice/wiki/Read-Excel-document-using-Spark-1.x Exception in thread "main" java.lang.NoClassDefFoundError:…
1
vote
1 answer

Reading excel files in a streaming fashion in spark 2.0.0

I have a set of Excel format files which needs to be read from Spark(2.0.0) as and when an Excel file is loaded into a local directory. Scala version used here is 2.11.8. I've tried using readstream method of SparkSession, but I'm not able to read…
Pooja Nayak
  • 182
  • 1
  • 4
  • 11
0
votes
0 answers

Crealytics Spark Excel Package import issues

I am trying to process Excel files using Spark. I have created a session and added the dependent jar and package in configuration. Spark Version = 3.1.1 Scala version = 2.12 I have added this jar = "spark-excel_2.12-3.1.1_0.18.7". But still I am…
user3104078
  • 107
  • 1
  • 7
0
votes
1 answer

java.lang.NullPointerException while reading specific sheet from xlsx using org.zuinnote.spark.office.excel

We are trying to read one specific sheet from Excel (.xlsx with 3 sheets) using org.zuinnote.spark.office.excel into spark dataframe. We are using MSExcelLowFootprintParser parser. code used is val hadoopConf = new Configuration() val spark =…
Ashish Mishra
  • 510
  • 4
  • 18
0
votes
0 answers

How to define spark-excel schema without specific order

I'm reading in an XLS using spark-excel. My program has a base format (same columns and headers), but between different users they may have additional columns that are not required for my program. Is there a way to define the Schema, or at least,…
l Steveo l
  • 516
  • 3
  • 11
0
votes
0 answers

Getting MatchError while constructing dataframe from a excel xlsx file in scala-spark

used the following below function but still getting match error def readExcel(file: String): DataFrame = sqlContext.read .format("com.crealytics.spark.excel") .option("location", file) .option("useHeader", "true") …
0
votes
0 answers

Pyspark driver error while reading Excel files

I am reading excel files using pyspark.All the dataframes are stored inside a list. while merging all the data frames I am getting out of memory error. The code looks below. def union_spark_dfs(*dfs): return reduce(lambda df1, df2:…
code_bug
  • 355
  • 1
  • 12
0
votes
0 answers

spark-excel_2.10 package creating conflict with maven on Mac

This java package work fine on my Windows machine, but when I run this on my Mac machine it create conflict with maven; both machine have same version of maven. Can you tell me why it's happening, and can you give me the solution to this…
0
votes
0 answers

Unable to write dataframe as xslx format in spark scala

df .coalesce(1) .write .format("com.crealytics.spark.excel") .option("useHeader", "true") .option("header", "true") .mode(SaveMode.Append) .save(s"s3://$bucket/$etlFolderPrefix/a.xlsx") ERROR [main] glue.ProcessLauncher…
0
votes
1 answer

i'm unable to perform skipFirstRows parameter while reading excel in pyspark - python

Note: we should not use pandas.read_excel() while reading excel in my case. we only need to use spark-excel jar installed in our cluster. my main point is. we have skip few lines in the excel sheet while reading the file by using any logic or any…
Goutham Boine
  • 27
  • 1
  • 9
0
votes
1 answer

read percentage values in spark

I have a xlsx file which has a single column ; percentage 30% 40% 50% -10% 0.00% 0% 0.10% 110% 99.99% 99.98% -99.99% -99.98% when i read this using Apache-Spark out put i get is, |percentage| +----------+ | 0.3| | 0.4| | 0.5| | …