Questions tagged [apache-spark-dataset]

Spark Dataset is a strongly typed collection of objects mapped to a relational schema. It supports the similar optimizations to Spark DataFrames providing type-safe programming interface at the same time.

External links:

Related tags: , , ,

950 questions
-2
votes
1 answer

How to transform JSON to relational database tables using Spark

I have json messages that I want to parse and store into relational db tables. The json messages have multiple levels of arrays. For example: { "orderid": "123", "orderdate": "2021-12-23", "orderlines": [ { "orderlinenum":…
-2
votes
2 answers

Process each row to get date

I have a file having year and mon01,mon02 extract month using last two characters from columname(ie - 01 from MON01) length of text value in the respective months(MON01,MON02..) is same as number of days in the month. where retrive the date for…
-2
votes
1 answer

how to handle this in spark

I am using spark-sql 2.4.x version , datastax-spark-cassandra-connector for Cassandra-3.x version. Along with kafka. I have a scenario for some finance data coming from kafka topic. data (base dataset) contains companyId, year , prev_year fields…
-2
votes
1 answer

What are necessary conditions for taking Union of two datasets in spark java

What are necessary conditions like no of columns or identical columns or different columns
-2
votes
3 answers

Getting "org.apache.spark.sql.AnalysisException" when creating Dataset from RDD

I have recently started working with Spark's Dataset API and I am trying out a few examples. The following is one such example which fails with AnalysisException. case class Fruits(name: String, quantity: Int) val source = Array(("mango", 1),…
Sivaprasanna Sethuraman
  • 4,014
  • 5
  • 31
  • 60
-2
votes
1 answer

How to iterate through dataframe without converting to dataset in spark?

I have a dataframe through which I want to iterate, but I dont want to convert dataframe to dataset. We have to convert spark scala code to pyspark and pyspark does not support dataset. I have tried the following code with by converting to…
-2
votes
1 answer

combine scala dataframe columns into single case class

I have a dataframe that looks like this: +--------+-----+--------------------+ | uid| iid| color| +--------+-----+--------------------+ |41344966| 1305| red| |41344966| 1305| green| I want to get to…
Ollie
  • 624
  • 8
  • 27
-2
votes
1 answer

How to convert a sql to spark dataset?

I have a Val test=sql ("Select * from table1) which returns a dataframe. I want to convert it to dataset which is not working. test.toDS is throwing error.
M.S
  • 1
-2
votes
1 answer

How can I use GroupBy and than Map over Dataset?

I'm working with Datasets and trying to group by and then use map. I am managing to do it with RDD's but with dataset after group by I don't have the option to use map. Is there a way I can do it?
-2
votes
2 answers

How to split JSON into Dataset rows?

I have the following JSON input data: { "lib": [ { "id": "a1", "type": "push", "icons": [ { "iId": "111" } ], "id": "a2", "type": "pull", "icons": [ …
ScalaBoy
  • 3,254
  • 13
  • 46
  • 84
-2
votes
3 answers

Spark Dataset - How to create a new column by modifying an existing column value

I have a Dataset like below Dataset dataset = ... dataset.show() | NAME | DOB | +------+----------+ | John | 19801012 | | Mark | 19760502 | | Mick | 19911208 | I want to convert it to below (formatted DOB) | NAME | DOB …
-2
votes
1 answer

Using coalesce(1) is taking too much time time for writing dataset to s3

I'm using coalesce(1) for writing the set of records in s3 bucket in csv process. which is taking too much time for 505 records. dataset.coalesce(1).write().csv("s3a://bucketname/path"); And I want to mention that before this writing process, I'm…
-2
votes
1 answer

How do I achieve this in Apache Spark Java or Scala?

A device on a car will NOT send a TRIP ID when the trip starts but will send one when the TRIP ends. How do I apply corresponding TRIP IDS to the corresponding…
-2
votes
1 answer

Translate of a sql query into spark transformation

I want to make my transformation on my data into my programme Spark-JAVA : this is my sql query : SELECT ID AS Identifier, IFNULL(INTITULE,'') AS NAME_INTITULE, IFNULL(ID_CAT,'') AS CODE_CATEGORIE FROM db_1.evenement where DATE_HIST > (select…
-2
votes
1 answer

Getting the Summary of Whole Dataset or Only Columns in Apache Spark Java

For below Dataset, to get Total Summary values of Col1 , I did import org.apache.spark.sql.functions._ val totaldf = df.groupBy("Col1").agg(lit("Total").as("Col2"), sum("price").as("price"), sum("displayPrice").as("displayPrice")) and then merged…
1 2 3
63
64