I'am using pyspark I have a table like this:
id | ClientNum | Value | Date | Age | Country | Job
1 | 19 | A | 1483695000 | 21 | null | null
2 | 19 | A | 1483696500 | 21 | France | null
3 | 19 | A | 1483697800 | 21 | France | Engineer
4 | 19 | B | 1483699000 | 21 | null | null
5 | 19 | B | 1483699500 | 21 | France | null
6 | 19 | B | 1483699800 | 21 | France | Engineer
7 | 24 | C | 1483699200 | null | null | null
8 | 24 | D | 1483699560 | 28 | Spain | null
9 | 24 | D | 1483699840 | 28 | Spain | Student
Based on the column Value, i want to keep for each ClientNum distinct values where the most informations (Age,Country,Job) are specified.
the result is supposed to be something like this:
ClientNum | Value | Date | Age | Country | Job
19 | A | 1483697800 | 21 | France | Engineer
19 | B | 1483699800 | 21 | France | Engineer
24 | C | 1483699200 | null | null | null
24 | D | 1483699840 | 28 | Spain | Student
Thanks !