How to use declared date variable while running pyspark sql?

Question

I am trying to run a pyspark query by declaring my date variables and using these variables in the query itself. However, the output doesn't reflect the date filter. my existing code below

strt_dt = "'2018-01-01'"
 end_dt = "'2019-12-31'"


 df = sqlc.sql('Select * from tbl where dt > {0} and dt < {1}'.format(strt_dt,end_dt))

when I check the max(dt) of df above, it is greater than 2019-12-31 which should not be the case as per the code. I can further filter on this spark df for the required date range using the below code (source from Pyspark: Filter dataframe based on multiple conditions

 strt_dt = '2018-01-01'
 end_dt = '2019-12-31' 
 df = sqlc.sql('Select * from tbl')  
 df.filter((col('dt') >= F.lit(strt_dt)) & (col('dt') < F.lit(end_dt)))

I want to avoid the filtering of spark df as I do not want to create a df with all of the data. Please let me know what I am doing wrong in the first set of code.

PS: When I declare variables other than date datatype in the first block of code, the variable works for that column name. i.e. this is specific to date datatype stored as string in my HIVE table.

Thanks in advance

Is your Spark 'dt' column type date? If so, you need to use cast for declared date strings. Try to use SQL CAST to date. BTW, it might be useful to show df.printSchema() result in your question. — furkanayd, Apr 13 '20 at 19:51
@furkanayd the source /input 'tbl' of my pyspark query is a HIVE table with all data types as string including the date field — vagautam, Apr 13 '20 at 20:00

Peng Xu · Answer 1 · 2022-08-26T16:53:00.663

0

Try this:

strt_dt = '2018-01-01'
end_dt = '2019-12-31'


df = sqlc.sql('Select * from tbl where dt > "{0}" and dt < "{1}"'.format(strt_dt,end_dt))

edited Aug 26 '22 at 16:53

answered Aug 26 '22 at 16:52

Peng Xu

1
1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 31 '22 at 02:49

How to use declared date variable while running pyspark sql?

1 Answers1