Is this expected behaviour? I thought to raise an issue with Spark, but this seems such a basic functionality, that it's hard to imagine that there's a bug here. What am I missing?
Python
import numpy as np
>>> np.nan < 0.0
False
>>> np.nan > 0.0
False
PySpark
from pyspark.sql.functions import col
df = spark.createDataFrame([(np.nan, 0.0),(0.0, np.nan)])
df.show()
#+---+---+
#| _1| _2|
#+---+---+
#|NaN|0.0|
#|0.0|NaN|
#+---+---+
df.printSchema()
#root
# |-- _1: double (nullable = true)
# |-- _2: double (nullable = true)
df.select(col("_1")> col("_2")).show()
#+---------+
#|(_1 > _2)|
#+---------+
#| true|
#| false|
#+---------+