I am getting an error in a PySpark code when I am using below query in spark.sql clause, and it is supporting in BigQuery when I run it in BigQuery directly.
df = spark.sql('''SELECT h.src, AVG(h.norm_los) AS
mbr_avg FROM UM h WHERE h.orig_dt <
date_sub(current_date,INTERVAL 30 DAY) AND CAST((DATE_ADD(DATE(h.orig_dt), INTERVAL 18
MONTH)) AS DATE) >= date_sub(current_date,INTERVAL 30 DAY) AND h.src IS NOT NULL GROUP BY
h.src_cumb_id''')
When I am running the same query using PySpark under dataproc cluster, it is converting the interval 30 days
to interval 4 weeks 2 days
and I am getting below error.
cannot resolve 'date_sub(current_date(), interval 4 weeks 2 days)' due to data type mismatch: argument 2 requires int type, however, 'interval 4 weeks 2 days' is of calendarinterval type
Note: I am using option("intermediateFormat","orc") as well.
Any help on this would be appreciated