1

I am trying to run a pyspark query by declaring my date variables and using these variables in the query itself. However, the output doesn't reflect the date filter. my existing code below

strt_dt = "'2018-01-01'"
 end_dt = "'2019-12-31'"


 df = sqlc.sql('Select * from tbl where dt > {0} and dt < {1}'.format(strt_dt,end_dt))

when I check the max(dt) of df above, it is greater than 2019-12-31 which should not be the case as per the code. I can further filter on this spark df for the required date range using the below code (source from Pyspark: Filter dataframe based on multiple conditions

 strt_dt = '2018-01-01'
 end_dt = '2019-12-31' 
 df = sqlc.sql('Select * from tbl')  
 df.filter((col('dt') >= F.lit(strt_dt)) & (col('dt') < F.lit(end_dt)))

I want to avoid the filtering of spark df as I do not want to create a df with all of the data. Please let me know what I am doing wrong in the first set of code.

PS: When I declare variables other than date datatype in the first block of code, the variable works for that column name. i.e. this is specific to date datatype stored as string in my HIVE table.

Thanks in advance

vagautam
  • 81
  • 11
  • Is your Spark 'dt' column type date? If so, you need to use cast for declared date strings. Try to use SQL CAST to date. BTW, it might be useful to show df.printSchema() result in your question. – furkanayd Apr 13 '20 at 19:51
  • @furkanayd the source /input 'tbl' of my pyspark query is a HIVE table with all data types as string including the date field – vagautam Apr 13 '20 at 20:00

1 Answers1

0

Try this:

strt_dt = '2018-01-01'
end_dt = '2019-12-31'


df = sqlc.sql('Select * from tbl where dt > "{0}" and dt < "{1}"'.format(strt_dt,end_dt))
Peng Xu
  • 1
  • 1
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Aug 31 '22 at 02:49