When running the same job in local (IntelliJ IDEA) the output counts are fine (For eg -55).
But When submitted it on Yarn using spark-submit, Getting only few rows out of it (Rows -12).
spark2-submit --master yarn --deploy-mode client…
I have a dataframe var cache :DataFrame = _. As an initial run i have given, cache = existingDF, the existingdf is read from an excel using crealytics.spark.excel.
but in the subsequent run, the existingDF will get another updated excel file, it…
(new to apache spark)
I tried to create a small Scala Spark app which read excel files and insert data into database, but I have some errors which are occured due of different library versions (I think).
Scala v2.12
Spark v3.0
Spark-Excel…
I need to load data in already created hive table in orc format.
ie I need to read data from excel sheet, create data frame and then load it into hive tables in orc format.
I am trying to read excel files from COS via spark , like this
def readExcelData(filePath: String, spark: SparkSession): DataFrame =
spark.read
.format("com.crealytics.spark.excel")
.option("path", filePath)
…
I am trying to read an excel file via com.crealytics.spark.excel. But I am facing the following error , while trying to run my code:
scala.MatchError: Map(treatemptyvaluesasnulls -> true, location -> a.xlsx, useheader -> true, inferschema -> False,…
I have an Excel file as source file and I want to read data from Excel file and convert data in DataFrame using Databricks. I am new in Scala.
val df = spark.read.format("com.crealytics.spark.excel")
.option("location",…
I'm trying to read a .xlsx file and convert it to a Dataframe using spark-excel. But when I try to read the file it's throwing a
java.lang.IllegalArgumentException: InputStream of class class
…
I am looking for way to construct the dataframe from an excel file in spark using scala? I referred below SO post and tried doing an operation for an excel sheet attached.
How to construct Dataframe from a Excel (xls,xlsx) file in Scala…
I'm using spark-excel to read excel files, the problem is whenever I use a file with multilines header, the QueryExecution of the dataset throw an exception Method threw 'scala.MatchError' exception. Cannot evaluate…
I am using spark-excel(com.crealytics.spark.excel) library to read excel file. If no duplicate column in excel file, the library working fine. If any duplicate column name occurs in excel file, throwing below exception.
How to overcome this…
I want to load data from an Excel file in HDFS using Spark Session 2.2. Here is bellow my Java code and the exception I got.
Dataset df =
session.read().
format("com.crealytics.spark.excel").
…
I am trying to read an excel sheet from Amazon S3 and here is the code snippet. But it fails saying file doesn't exist though its there , I checked there is a slash (/) missing from the path.
println(path)
val data = sqlContext.read.
…
I would like to save a Spark DataFrame into Excel.
I have done it for csv by saving csv file in each node and appending it in the server using the DataBricks spark-csv library.
I Don't know how to do it for Excel. Somebody please suggest and idea.
I have to read a whole directory of xlsx files, and I need to load all the directory with Apache Spark using Scala.
Actually I'm using this dependency : "com.crealytics" %% "spark-excel" % "0.12.3", and I don't know how to load all.