Questions tagged [apache-spark-dataset]

Spark Dataset is a strongly typed collection of objects mapped to a relational schema. It supports the similar optimizations to Spark DataFrames providing type-safe programming interface at the same time.

External links:

Related tags: , , ,

950 questions
-2
votes
1 answer

Spark : How do I exploded data and add column name also in pyspark or scala spark?

Spark: I want explode multiple columns and consolidate as single column with column name as separate row. Input data: +-----------+-----------+-----------+ | ASMT_ID | WORKER | LABOR | +-----------+-----------+-----------+ …
-3
votes
1 answer

How to count the values which are repeating in an array using RDD,dataframe,dataset

I have to count the repeating values in an array val arr = Array(1,2,2,3,4,5,5,5) For example how to count the number of 5s in the array using RDD, Dataframe, Datasets?
-3
votes
1 answer

Spark dataset related

My data set is looking like this by following code +------+---------------+----+ | City| Timestamp|Sale| +------+---------------+----+ |City 3|6/30/2017 16:04| 28| |City 4| 7/4/2017 16:04| 12| |City 2|7/13/2017 16:04| 8| |City 4|7/16/2017…
-4
votes
1 answer

Change value of a row using multiple columns in Spark DataFrame

I got a dataframe(df) of this format . df.show() ******************** X1 | x2 | X3 | ..... | Xn | id_1 | id_2 | .... id_23 1 | ok |good| john | null | null | |null 2 |rick |good| | ryan | null | null | |null .... I got…
-6
votes
1 answer

How to process this in parallel on cluster using MapFunction and ReduceFunction of spark-java api?

I am using spark-sql-2.4.1v with java8. Have to do a calculation using group by on various conditions using java api i.e. using MapFunction and ReduceFunction. Scenario : Have source data given sample as…
1 2 3
63
64