I have a dataframe which contains 4 columns.
Dataframe sample
id1 id2 id3 id4
---------------
a1 a2 a3 a4
b1 b2 b3 b4
b1 b2 b3 b4
c1 c2 c3 c4
b2
c1
a3
a4
c1
d4
There are 2 types of data in a row either all the columns have data or only one column.
I want to perform distinct function on all the columns such as while comparing the values between rows, it will only compare the value which is present in a row and don't consider the null values.
Output dataframe should be
id1 id2 id3 id4
a1 a2 a3 a4
b1 b2 b3 b4
c1 c2 c3 c4
d4
I have looked multiple examples of UDAF in spark. But not able to modified according.