6

I have a dataframe with following columns:

groupid,unit,height
----------------------
1,in,55
2,in,54

I want to create another dataframe with additional rows where unit=cm and height=height*2.54.

Resulting dataframe:

groupid,unit,height
----------------------
1,in,55
2,in,54
1,cm,139.7
2,cm,137.16

Not sure how I can use spark udf and explode here. Any help is appreciated. Thanks in advance.

zero323
  • 322,348
  • 103
  • 959
  • 935
dreddy
  • 463
  • 1
  • 7
  • 21

1 Answers1

11

you can create another dataframe with changes you require using withColumn and then union both dataframes as

import sqlContext.implicits._
import org.apache.spark.sql.functions._

val df = Seq(
  (1, "in", 55),
  (2, "in", 54)
).toDF("groupid", "unit", "height")

val df2 = df.withColumn("unit", lit("cm")).withColumn("height", col("height")*2.54)

df.union(df2).show(false)

you should have

+-------+----+------+
|groupid|unit|height|
+-------+----+------+
|1      |in  |55.0  |
|2      |in  |54.0  |
|1      |cm  |139.7 |
|2      |cm  |137.16|
+-------+----+------+
Ramesh Maharjan
  • 41,071
  • 6
  • 69
  • 97