Questions tagged [apache-spark-dataset]

Spark Dataset is a strongly typed collection of objects mapped to a relational schema. It supports the similar optimizations to Spark DataFrames providing type-safe programming interface at the same time.

External links:

SPARK-9999 - Dataset API on top of Catalyst/DataFrame
Michael Armbrust, Wenchen Fan, Reynold Xin and Matei Zaharia. Introducing Spark Datasets. https://databricks.com/blog/2016/01/04/introducing-spark-datasets.html

Related tags: apache-spark, apache-spark-sql, spark-dataframe, rdd

950 questions

-2

votes

1 answer

Spark : How do I exploded data and add column name also in pyspark or scala spark?

Spark: I want explode multiple columns and consolidate as single column with column name as separate row. Input data: +-----------+-----------+-----------+ | ASMT_ID | WORKER | LABOR | +-----------+-----------+-----------+ …

apache-spark apache-spark-sql apache-spark-dataset

asked Feb 12 '18 at 14:28

sivaguru

-3

votes

1 answer

How to count the values which are repeating in an array using RDD,dataframe,dataset

I have to count the repeating values in an array val arr = Array(1,2,2,3,4,5,5,5) For example how to count the number of 5s in the array using RDD, Dataframe, Datasets?

scala apache-spark apache-spark-sql rdd apache-spark-dataset

asked Jun 05 '19 at 03:07

Chaitanya

-3

votes

1 answer

Spark dataset related

My data set is looking like this by following code +------+---------------+----+ | City| Timestamp|Sale| +------+---------------+----+ |City 3|6/30/2017 16:04| 28| |City 4| 7/4/2017 16:04| 12| |City 2|7/13/2017 16:04| 8| |City 4|7/16/2017…

apache-spark apache-spark-sql apache-spark-dataset

asked Jul 06 '17 at 04:59

Ved Prakash

-4

votes

1 answer

Change value of a row using multiple columns in Spark DataFrame

scala apache-spark apache-spark-sql apache-spark-dataset

asked Feb 05 '19 at 05:47

user3742120

-6

votes

1 answer

How to process this in parallel on cluster using MapFunction and ReduceFunction of spark-java api?

I am using spark-sql-2.4.1v with java8. Have to do a calculation using group by on various conditions using java api i.e. using MapFunction and ReduceFunction. Scenario : Have source data given sample as…

java dataframe apache-spark apache-spark-sql apache-spark-dataset

asked Apr 23 '20 at 15:53

BdEngineer

2,929
4
49
85

Prev 1 2 3

…