Questions tagged [apache-spark-dataset]

Spark Dataset is a strongly typed collection of objects mapped to a relational schema. It supports the similar optimizations to Spark DataFrames providing type-safe programming interface at the same time.

External links:

SPARK-9999 - Dataset API on top of Catalyst/DataFrame
Michael Armbrust, Wenchen Fan, Reynold Xin and Matei Zaharia. Introducing Spark Datasets. https://databricks.com/blog/2016/01/04/introducing-spark-datasets.html

Related tags: apache-spark, apache-spark-sql, spark-dataframe, rdd

950 questions

-2

votes

1 answer

How to transform JSON to relational database tables using Spark

I have json messages that I want to parse and store into relational db tables. The json messages have multiple levels of arrays. For example: { "orderid": "123", "orderdate": "2021-12-23", "orderlines": [ { "orderlinenum":…

apache-spark apache-spark-sql apache-spark-dataset

asked Dec 23 '21 at 16:35

k50

-2

votes

2 answers

Process each row to get date

I have a file having year and mon01,mon02 extract month using last two characters from columname(ie - 01 from MON01) length of text value in the respective months(MON01,MON02..) is same as number of days in the month. where retrive the date for…

scala apache-spark apache-spark-sql rdd apache-spark-dataset

asked Mar 19 '20 at 07:43

Ravi Anand

-2

votes

1 answer

how to handle this in spark

I am using spark-sql 2.4.x version , datastax-spark-cassandra-connector for Cassandra-3.x version. Along with kafka. I have a scenario for some finance data coming from kafka topic. data (base dataset) contains companyId, year , prev_year fields…

apache-spark apache-spark-sql spark-streaming apache-spark-dataset

asked Jan 03 '20 at 10:41

BdEngineer

2,929
4
49
85

-2

votes

1 answer

What are necessary conditions for taking Union of two datasets in spark java

What are necessary conditions like no of columns or identical columns or different columns

java apache-spark union apache-spark-dataset

asked Nov 30 '19 at 13:01

MOHIT SONJE

-2

votes

3 answers

Getting "org.apache.spark.sql.AnalysisException" when creating Dataset from RDD

I have recently started working with Spark's Dataset API and I am trying out a few examples. The following is one such example which fails with AnalysisException. case class Fruits(name: String, quantity: Int) val source = Array(("mango", 1),…

apache-spark rdd apache-spark-dataset

asked Jun 02 '19 at 12:51

Sivaprasanna Sethuraman

4,014
5
31
60

-2

votes

1 answer

How to iterate through dataframe without converting to dataset in spark?

I have a dataframe through which I want to iterate, but I dont want to convert dataframe to dataset. We have to convert spark scala code to pyspark and pyspark does not support dataset. I have tried the following code with by converting to…

apache-spark pyspark apache-spark-sql apache-spark-dataset

asked Mar 28 '19 at 18:09

saurabh kumar

-2

votes

1 answer

combine scala dataframe columns into single case class

I have a dataframe that looks like this: +--------+-----+--------------------+ | uid| iid| color| +--------+-----+--------------------+ |41344966| 1305| red| |41344966| 1305| green| I want to get to…

scala apache-spark apache-spark-dataset

asked Feb 28 '19 at 20:21

Ollie

-2

votes

1 answer

How to convert a sql to spark dataset?

I have a Val test=sql ("Select * from table1) which returns a dataframe. I want to convert it to dataset which is not working. test.toDS is throwing error.

apache-spark apache-spark-dataset

asked Feb 18 '19 at 08:58

M.S

-2

votes

1 answer

How can I use GroupBy and than Map over Dataset?

I'm working with Datasets and trying to group by and then use map. I am managing to do it with RDD's but with dataset after group by I don't have the option to use map. Is there a way I can do it?

scala apache-spark apache-spark-dataset

asked Jan 23 '19 at 10:21

Avshalom Orenstein

-2

votes

2 answers

How to split JSON into Dataset rows?

I have the following JSON input data: { "lib": [ { "id": "a1", "type": "push", "icons": [ { "iId": "111" } ], "id": "a2", "type": "pull", "icons": [ …

java apache-spark apache-spark-sql apache-spark-dataset

asked Dec 04 '18 at 18:17

ScalaBoy

3,254
13
46
84

-2

votes

3 answers

Spark Dataset - How to create a new column by modifying an existing column value

I have a Dataset like below Dataset dataset = ... dataset.show() | NAME | DOB | +------+----------+ | John | 19801012 | | Mark | 19760502 | | Mick | 19911208 | I want to convert it to below (formatted DOB) | NAME | DOB …

java scala apache-spark apache-spark-sql apache-spark-dataset

asked Oct 31 '18 at 12:17

Nithin Satheesan

1,546
3
17
30

-2

votes

1 answer

Using coalesce(1) is taking too much time time for writing dataset to s3

I'm using coalesce(1) for writing the set of records in s3 bucket in csv process. which is taking too much time for 505 records. dataset.coalesce(1).write().csv("s3a://bucketname/path"); And I want to mention that before this writing process, I'm…

java csv apache-spark coalesce apache-spark-dataset

asked Oct 31 '18 at 11:56

Sandeep kushwaha

-2

votes

1 answer

How do I achieve this in Apache Spark Java or Scala?

A device on a car will NOT send a TRIP ID when the trip starts but will send one when the TRIP ends. How do I apply corresponding TRIP IDS to the corresponding…

apache-spark apache-spark-sql apache-spark-dataset apache-spark-2.0

asked Aug 01 '18 at 02:14

Vinodh Thiagarajan

-2

votes

1 answer

Translate of a sql query into spark transformation

I want to make my transformation on my data into my programme Spark-JAVA : this is my sql query : SELECT ID AS Identifier, IFNULL(INTITULE,'') AS NAME_INTITULE, IFNULL(ID_CAT,'') AS CODE_CATEGORIE FROM db_1.evenement where DATE_HIST > (select…

apache-spark dataset apache-spark-sql apache-spark-dataset

asked May 18 '18 at 13:16

tchiko

-2

votes

1 answer

Getting the Summary of Whole Dataset or Only Columns in Apache Spark Java

For below Dataset, to get Total Summary values of Col1 , I did import org.apache.spark.sql.functions._ val totaldf = df.groupBy("Col1").agg(lit("Total").as("Col2"), sum("price").as("price"), sum("displayPrice").as("displayPrice")) and then merged…

apache-spark apache-spark-sql apache-spark-dataset

asked Mar 05 '18 at 07:42

Aymar harsh

Prev 1 2 3

…

64 Next