0

I have a spark dataframe like below

id|name|age|sub
1 |ravi|21 |[M,J,J,K]

I don't want to explode on the column "sub" as it will create another extra set of rows. I want generate unique values from the "sub" column and assign it to new column sub_unique.

My output should be like

id|name|age|sub_unique
1 |ravi|21 |[M,J,K]
Puneeth Kumar
  • 171
  • 3
  • 15

1 Answers1

0

You can use udf

val distinct = udf((x: Seq[String]) => if (s != null) x.distinct else Seq[String]())

df.withColumn("subm_unique", distinct($"sub"))
user7337271
  • 1,662
  • 1
  • 14
  • 23