I have a DataFrame which has a column in the form of string. This looks like:
`+--------------------------------------------------------------------------------------------------------------------------------------+
|queue_sequence |
+--------------------------------------------------------------------------------------------------------------------------------------+
|In Queue,In-Progress,Internally,Development Done/ Eng testing,In-Progress,Development Done/ Eng testing,Complete |
|In Queue,In-Progress,Complete,In-Progress,Complete |
|In Queue,Development,Development Ready,In Queue,Development,In Queue,Complete |
|In Queue,Analyze,In-Progress,ISRM,Externally,ISRM,Complete |
|In Queue,Complete,In-Progress,Complete |
|In Queue,DSM/UCL,Complete |
|In Queue,In-Progress,Development Done/ Eng testing,Complete,In Queue,In-Progress,Development Done/ Eng testing,Complete |
|In Queue,In-Progress,Externally,Development Done/ Eng testing,Complete |
|In Queue,In-Progress,Development Done/ Eng testing,DSM/UCL,In-Progress,ISRM,In-Progress,Development Done/ Eng testing,Complete |
|In Queue,Development,Development Ready,In Queue,Development,Development Done/ Eng testing,Development,Complete |
|In Queue,In-Progress,In Queue,In-Progress,ISRM,Complete |
|In Queue,Development Ready,In-Progress,Done,Complete |`
I want to take the unique of all the comma separated words in each row.
I have tried the following code
`df.select("queue_sequence") .collect() .map(_.mkString)`
and stored it in a variable which looks like a Array[String]:
Array[String] = Array(In Queue,
In-Progress,
Internally,
Development Done/ Eng testing,
In-Progress,
Development Done/ Eng testing,
Complete,
In Queue,
In-Progress,
Complete,
In-Progress,
Complete,
In Queue,
Analyze,
In-Progress,
ISRM,
Externally,
ISRM,
Complete,
In Queue,
Development,
Development Ready,
In Queue,
Development,
In Queue,Complete
)
But this list is not unique . So how do i get them to distinct format
I tried the following:
.toSet.toList
.toList.Distinct
I am unable to get distinct words from that array. I tried the above-mentioned methods, but none of them worked.