1

I'm trying to filter a Spark DataFrame that resembles this one:

+-----+---+-----+-----+-----+-----+-------+
| name|age|key_1|key_2|key_3|key_4|country|
+-----+---+-----+-----+-----+-----+-------+
|  abc| 20|    1|    1|    1|    1|    USA|
|  def| 12|    2|    2|    3|    2|  China|
|  ghi| 40|    3|    3|    3|    3|  India|
|  jkl| 39|    4|    1|    4|    4|     UK|
+-----+---+-----+-----+-----+-----+-------+

Basically what I want to achieve is to find out what rows have mismatching keys, and in this case I want to get a new dataframe with the second and the fourth row.

I tried with

val unmatching = df.filter(df.col("key_1").notEqual(df.col("key_2")).notEqual(df.col("key_3")).notEqual(df.col("key_4")))

and what I get is a shorter dataset than the original, but where the keys seem to be equal.

Sagar Zala
  • 4,854
  • 9
  • 34
  • 62
Harjeet
  • 11
  • 2

1 Answers1

0
  1. find out the matching
  2. use except()
    val matching=...
    val unmatching= df.except(matching);
user3172755
  • 137
  • 1
  • 10