0

I am having an issue in using the RPostgres package to connect with Redshift. I am unsure if it is an issue with our database set up, or if it is a known issue with the package.

I am getting different results when I use the same exact query in Redshift vs. using it in R with the RPostgres package.

It appears entirely due to the date math, as my overall row counts match, as well as everything else, when not using dates.

As an example, this may be a query I run in Redshift (using Metabase). If I run the same exact query in R with the RPostgres package, I will get completely different results.

SELECT
orders.*
FROM
orders
WHERE 
orders.date >= current_date-3
AND 
orders.date < current_date-2

The dates in Metabase make sense. It shows only one day, three days ago. However, in R, it shows 2 days.

For the purposes of this example, consider date a timestamp.

Has anyone ran into this, or know of an existing issue and work around?

Devin
  • 363
  • 2
  • 20
  • 3
    Is orders.date a timestamp with time zone or without time zone? Is the time zone of the client the same when you use Redshift and R? – Jeremy Apr 26 '19 at 14:37
  • It's just a timestamp. Does the query convert to a timezone in R for some reason? In the client, it is a timestamp without time zone. What's strange is that if I pull an individual for comparison to the client, the date and time is the exact same. – Devin Apr 26 '19 at 14:39
  • Jeremy, you are correct. Everything is 4 hours off. So if I specify in the WHERE clause: ```orders.date BETWEEN '04/22/2019 4:00:00' AND '04/23/2019 3:59:59'```, I will get orders only on April 22nd. No need to do this in the client, and the output matches the client, so I'm still not sure why the package does this. I'm sure there is a better work around, convert_timezone etc. So I'll leave the question open for a better explanation. – Devin Apr 26 '19 at 15:02
  • "current_date" does not return a timestamp value in Redshift, it returns a _date_ only, based on the time zone of the current _session_. If you're comparing this to a timestamp value then a time component of "00:00:00" will be used. You might need to use SET TIMEZONE = ''; at the start of both scripts to ensure you're comparing like with like. – Nathan Griffiths Apr 29 '19 at 03:40

1 Answers1

1

It's usually better to be absolutely explicit with dates to make sure that session settings are not unintentionally affecting the query. Try this:

SELECT
orders.*
FROM
orders
WHERE 
orders.date >= date_trunc('day', current_timestamp at time zone 'utc') - '3 days'::interval
AND 
orders.date < date_trunc('day', current_timestamp at time zone 'utc') - '2 days'::interval

You may have to change the time zone from utc if your dates are implicitly stored in a different time zone.

Jeremy
  • 6,313
  • 17
  • 20