1

I have a large database many tables with several million records each and performance is important. For the purpose of this question, let's use a simpler example.

I am having issues performing a JOIN on two tables on timestamps around DST hour. DST happened this year on November 5th at 2:00 AM in timezone America/New_York. Basically, at 2:00 AM, the clocks were set back to 1:00 AM, so 2017-11-05 1:00:00 "happened twice".

Right now, my server is in UTC and I have the following tables:

Table 1:

2017-11-05 04:00:00              1.00
2017-11-05 05:00:00              2.00
2017-11-05 06:00:00              3.00

Table 2:

2017-11-05 04:00:00              4.00
2017-11-05 05:00:00              5.00
2017-11-05 06:00:00              6.00

In these tables, the timestamps are in UTC and of type TIMESTAMP.

I want to create a view as such:

SELECT t1.time, t1.value, t2.value FROM test1 t1 LEFT JOIN test2 t2 ON t1.time = t2.time

Which produces the following output:

2017-11-05 04:00:00              1.00     4.00
2017-11-05 05:00:00              2.00     5.00
2017-11-05 06:00:00              3.00     6.00

This is correct when my server is set to timezone UTC. However, my users are in eastern Canada, in timezone "America/New_York" and because I don't want them to set their user connection timezone everytime they need to access data, I set the server time to "America/New_York". When I perform the same SELECT statement and the server is set to the EST timezone, I get the following results:

2017-11-05 00:00:00    1.00    4.00
2017-11-05 01:00:00    2.00    5.00
2017-11-05 01:00:00    3.00    5.00
2017-11-05 01:00:00    2.00    6.00
2017-11-05 01:00:00    3.00    6.00

Why is the JOIN performed on the result (timezone independant) and not on the actual timestamp with the timezone?

I should see:

2017-11-05 00:00:00              1.00     4.00
2017-11-05 01:00:00              2.00     5.00
2017-11-05 01:00:00              3.00     6.00

With the 3 rows being in EDT, EST and EST respectively, even though I don't need to see that information.

I don't want to use:

SELECT t1.time, t1.value, t2.value FROM test1 t1 LEFT JOIN test2 t2 ON unix_TIMESTAMP(t1.time) = unix_timestamp(t2.time)

This is for performance reasons, as converting to UNIX_TIMESTAMP for millions of records before the JOIN is very expensive and slow.

As I said, I would rather not set the server time to UTC because I don't want to ask my users to set the timezone to America/New_York every time they want to access data in their local time zone.

Vincent L
  • 699
  • 2
  • 11
  • 25
  • So is the problem the joining logic or an issue with the time zone? – SomeTonyGuy Nov 10 '17 at 20:07
  • I believe the JOIN shouldn't care about the timezone, since the field is of type TIMESTAMP and not DATETIME. But the JOIN seems to act as if it's a DATETIME and ignores the timezone. – Vincent L Nov 10 '17 at 20:08
  • 1
    Basically, the conversion from UTC (stored) to the timezone of the user connection is performed before the JOIN. – Vincent L Nov 10 '17 at 20:17
  • Not sure but I think the answer in this link may give you an indication to what the problem is. https://stackoverflow.com/questions/8119386/how-to-convert-sql-servers-timestamp-column-to-datetime-format – SomeTonyGuy Nov 10 '17 at 20:23
  • I don't understand how this answers my question. I want to perform a JOIN on the actual timestamp number. – Vincent L Nov 10 '17 at 20:25
  • I didn't say it was the answer...i said it would give an indication to the answer you are looking for. It mentioned a timestamp has nothing to do with dates, so the join can't act as if it were a date. Also, why not just have a columnID instead of relying on these timestamps? Hoping the timestamps in two different tables remains the same is asking for trouble. – SomeTonyGuy Nov 10 '17 at 20:28
  • I am not asking for a timestamp to be a date, that's exactly my point. Why does the JOIN perform as if it were? A timestamp is simply a number, so why does changing the timezone on the server change the query result? I am not "hoping" for anything here, the values represent prices in 2 different markets at various times during the year. I need to be able to join them. – Vincent L Nov 10 '17 at 20:32
  • Weird. `TIMESTAMP` data items are always stored in UTC, and translated to localtime on the way out of the DBMS, based on the time zone setting. – O. Jones Nov 10 '17 at 20:43
  • That's what I thought too! So I don't understand why this is happening :( – Vincent L Nov 10 '17 at 20:53
  • It would seem you have encountered a bug. What version of MySQL is this? Also, could you workaround by setting the session time zone to UTC before the query and then setting it back after? Or would that change the output? – Matt Johnson-Pint Nov 10 '17 at 22:15
  • I'm on 5.7.19 (also tried with another server on 5.7.20). Changing timezones before the query is exactly how I found this issue :) Basically, if I set it to 'America/New_York', I get 5 lines in the output and if I set it to 'UTC', I get 3 lines in the output. – Vincent L Nov 13 '17 at 16:12
  • Full bug report here: https://bugs.mysql.com/bug.php?id=88465&thanks=sub – Vincent L Nov 13 '17 at 16:57

0 Answers0