I have this spark DataFrame:
+---+-----+------+----+------------+------------+
| ID| ID2|Number|Name|Opening_Hour|Closing_Hour|
+---+-----+------+----+------------+------------+
|ALT| QWA| 6|null| 08:59:00| 23:30:00|
|ALT|AUTRE| 2|null| 08:58:00| 23:29:00|
|TDR| QWA| 3|null| 08:57:00| 23:28:00|
|ALT| TEST| 4|null| 08:56:00| 23:27:00|
|ALT| QWA| 6|null| 08:55:00| 23:26:00|
|ALT| QWA| 2|null| 08:54:00| 23:25:00|
|ALT| QWA| 2|null| 08:53:00| 23:24:00|
+---+-----+------+----+------------+------------+
I want to get a new dataframe with only the lines that are not unique regarding the 3 fields "ID"
, "ID2"
and "Number"
.
It means that I want this DataFrame:
+---+-----+------+----+------------+------------+
| ID| ID2|Number|Name|Opening_Hour|Closing_Hour|
+---+-----+------+----+------------+------------+
|ALT| QWA| 6|null| 08:59:00| 23:30:00|
|ALT| QWA| 2|null| 08:53:00| 23:24:00|
+---+-----+------+----+------------+------------+
Or maybe a dataframe with all the duplicates:
+---+-----+------+----+------------+------------+
| ID| ID2|Number|Name|Opening_Hour|Closing_Hour|
+---+-----+------+----+------------+------------+
|ALT| QWA| 6|null| 08:59:00| 23:30:00|
|ALT| QWA| 6|null| 08:55:00| 23:26:00|
|ALT| QWA| 2|null| 08:54:00| 23:25:00|
|ALT| QWA| 2|null| 08:53:00| 23:24:00|
+---+-----+------+----+------------+------------+