If you have a working SQL query, you can always register your DataFrame as a temp table and use spark.sql()
:
df.createOrReplaceTempView("MYTABLE")
spark.sql("SELECT * FROM MYTABLE WHERE '2018-12-31' BETWEEN start_dt AND end_dt").show()
#+-------+----------+----------+
#|ColumnA| START_DT| END_DT|
#+-------+----------+----------+
#| 1|2016-01-01|2020-02-04|
#+-------+----------+----------+
Another option is to pass an expression to where
:
df.where("'2018-12-31' BETWEEN start_dt AND end_dt").show()
#+-------+----------+----------+
#|ColumnA| START_DT| END_DT|
#+-------+----------+----------+
#| 1|2016-01-01|2020-02-04|
#+-------+----------+----------+
One more way is to use pyspark.sql.Column.between
with pyspark.sql.functions.lit
, but you'll have to use pyspark.sql.functions.expr
in order to use a column value as a parameter.
from pyspark.sql.functions import lit, expr
test_date = "2018-12-31"
df.where(lit(test_date).between(expr('start_dt'), expr('end_dt'))).show()
#+-------+----------+----------+
#|ColumnA| START_DT| END_DT|
#+-------+----------+----------+
#| 1|2016-01-01|2020-02-04|
#+-------+----------+----------+
Lastly, you can implement your own version of between
:
from pyspark.sql.functions import col
df.where((col("start_dt") <= lit(test_date)) & (col("end_dt") >= lit(test_date))).show()
#+-------+----------+----------+
#|ColumnA| START_DT| END_DT|
#+-------+----------+----------+
#| 1|2016-01-01|2020-02-04|
#+-------+----------+----------+