0

in scala spark we can filter if column A value is not equal to column B or same dataframe as df.filter(col("A")=!=col("B")) How we can do this same in Pyspark ?

I have tried differemt options like df.filter(~(df["A"] == df["B"])) and != operator but got errors

1 Answers1

1

Take a look at this snippet:

df = spark.createDataFrame([(1, 2), (1, 1)], "id: int, val: int")
df.show()
+---+---+
| id|val|
+---+---+
|  1|  2|
|  1|  1|
+---+---+

from pyspark.sql.functions import col

df.filter(col("id") != col("val")).show()
+---+---+
| id|val|
+---+---+
|  1|  2|
+---+---+


Bartosz Gajda
  • 984
  • 6
  • 14