0

I have a postgressql query. I tried it in db query and it seems to work fine. But in data bricks its not working as expected.

I have to use where clause. Where I am extracting date from timestamp and comparing it with passed value (which is in date time format). I am extracting date and checking if its less than the one from timestamp. Pls check in the WHERE clause.

select  DATE(to_timestamp(time,'YYYY-MM-DD %H24:%m:%s' )) as modifiedtime, *
FROM testg 
WHERE DATE(to_timestamp(time,'YYYY-MM-DD %H24:%m:%s' )) >= DATE(to_timestamp('2020-10-25 00:00:00','YYYY-MM-DD %H24:%m:%s' )) - INTERVAL '7 DAY'
order by time asc

In data factory the query looks like:

@concat('select * FROM streamingData WHERE property_name = ''', item(), '''' ,' AND  DATE(to_timestamp(''',  activity('Lookup1').output.firstRow.max_timestamp, '''',',''','YYYY-MM-DD %H24:%m:%s', ''' ))' ,' >= ''', 'DATE(to_timestamp(''',  pipeline().parameters.enddate, '''',',''','YYYY-MM-DD %H24:%m:%s', ''' ))'  )

Data:

time:

"2020-10-25 13:00:22.000000+01"
"2020-10-24 14:00:22.000000+01"
"2020-10-25 12:00:22.000000+01"
"2020-10-26 01:00:22.000000+01"

and I am using trigger().outputs.windowEndTime() - output is: 2020-10-20 01:00:22

I am comparing both the date values i.e. If 2020-10-26 >= 2020-10-20

Even after filtering, i get all the data i.e. also the dates from 2019 etc.

What am I missing.

  • Please define *not working as expected*. – GMB Nov 19 '20 at 13:49
  • What field type is `time` stored in? This `DATE(to_timestamp('2020-10-25 00:00:00','YYYY-MM-DD %H24:%m:%s' ))` can be shortened to `DATE(to_timestamp('2020-10-25 ', 'YYYY-MM-DD' ))` – Adrian Klaver Nov 19 '20 at 15:02
  • @GMB added an example –  Nov 19 '20 at 15:52
  • @AdrianKlaver time contains ISO format. Thats why I am converting it into date. –  Nov 19 '20 at 15:53
  • That still does not answer what the field type for `time` is? – Adrian Klaver Nov 19 '20 at 16:40
  • On reflection it can be shortened further to `date('2020-10-25')` or `2020-10-25'::date`(Postgres shortcut). – Adrian Klaver Nov 19 '20 at 16:50
  • You code example has `pipeline().parameters.enddate` versus your explanation which has `trigger().outputs.windowEndTime()`. You also still have not indicated what the error is in your data bricks output? – Adrian Klaver Nov 19 '20 at 16:58
  • @AdrianKlaver time is - timestamp. –  Nov 19 '20 at 20:22
  • @AdrianKlaver the where clause is not working. I am receiving all the dates. –  Nov 19 '20 at 20:23
  • I find it hard to believe `time` is being stored in a `timestamp` type field as `DATE(to_timestamp(time,'YYYY-MM-DD %H24:%m:%s' ))` would fail. The fact you are getting all rows means your test condition(` 2020-10-26 >= 2020-10-20`) is not being met. This means either the left or the right value is not what you think it is. You are going to need to see what is actually reaching the server by looking in the logs. – Adrian Klaver Nov 19 '20 at 20:34
  • @PUser ref this https://stackoverflow.com/a/64904309/10549281. Can you try `toString()`? – Leon Yue Nov 20 '20 at 00:23
  • @PUser is the error solved now? – Leon Yue Nov 22 '20 at 10:56
  • @LeonYue I did that yesterday itself. By using Symmetric function –  Nov 23 '20 at 09:08
  • @PUser congratulations! I think it can be posted as answer and others can now the issue is resolved now. May I post it? – Leon Yue Nov 23 '20 at 09:12

1 Answers1

1

Strangely I tried the comparison using Between (which was not working) and I was still getting all the years even after filtering.

Methods used :

  1. Used between clause - was not working
  2. Used <= and >= - was also not working.

I used later "SYMMETRIC" function. with between - with which the filter worked. I suppose in BETWEEN SYMMETRIC the arguments are automatically swapped and that a nonempty range is always implied.

at the end the query is like:

@concat('select * FROM streamingData WHERE property_name = ''', item(), '''' ,' AND  time BETWEEN SYMMETRIC to_timestamp(''',  activity('LKP_RMS_ValueStream_Time').output.firstRow.max_timestamp, '''',',''','YYYY-MM-DD', ''' )' ,' AND ', 'to_timestamp(''',  pipeline().parameters.enddate, '''',',''','YYYY-MM-DD', ''' )' , ' - ', 'INTERVAL ''', '7 DAY', '''' )

Thanks for the time and help.