1

I am working with AWS data pipeline. In this context, I am passing several parameters from pipeline definition to sql file as follows:

  s3://reporting/preprocess.sql,-d,RUN_DATE=#{@scheduledStartTime.format('YYYYMMdd')}"

My sql file looks like below:

CREATE EXTERNAL TABLE RESULT (
STUDENT_ID            STRING,
REMARKS               STRING,
EXAM_DATE             STRING
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE LOCATION  's3://result/data/run_date=${previous day of RUN_DATE}';  <----(1)

...
insert into temp
select a.roll_number, remarks
from student a inner join result b
on a.student_id = b.student_id
where exam_date>='<start date of previous month of RUN_DATE>' <---- (2)
and exam_date<='<end date of previous month of RUN_DATE>';<---- (3)

Here in the above sql, I am not getting any idea how I will achieve (1) ,(2) and (3) from RUN_DATE.

So if RUN_DATE=20190101, then value in (1) should be "20181231", value in (2) should be "2018-12-01" and (3) should be "2018-12-31" .

Joy
  • 4,197
  • 14
  • 61
  • 131

1 Answers1

0

You can use HIVE Date functions in your SQL to get your desired result:

  1. Previous Day: date_sub(RUN_DATE,1)
  2. Start date of previous month: date_add(last_day(add_months(RUN_DATE, -2)),1)
  3. End date of previous month: last_day(add_months(RUN_DATE, -1))
Amith Kumar
  • 4,400
  • 1
  • 21
  • 28