0

I am trying to query data from my postgresql database using pd.read_sql. There are timestamptz columns, with a combination of dates which have DST and some which don't. The dates which have a DST offset are being read in correctly, however those without a DST offset are being read as NaT.

All columns shown are of the same type, timestamptz

Example A - smart_departure has no DST offset
Queried using DBeaver

wtt_arrival wtt_pass wtt_departure smart_arrival smart_pass smart_departure
NULL NULL 1970-01-01 08:23:00.000 +0100 NULL NULL 2022-01-14 08:23:03.000 +0000

Queried using pd.read_sql ( for example pd.read_sql('select * from database', con=engine))

wtt_arrival wtt_pass wtt_departure smart_arrival smart_pass smart_departure
None None 1970-01-01 07:23:00+00:00 None None NaT

Query by retrieving value directly (suggested by Gord Thompson below)

(None, None, datetime.datetime(1970, 1, 1, 7, 23, tzinfo=datetime.timezone.utc), None, None, None)

Queried using psql (suggested by Adrian Klaver)

wtt_arrival wtt_pass wtt_departure smart_arrival smart_pass smart_departure
1970-01-01 07:23:00+00

Example B - smart_arrival has DST offset
Queried using DBeaver

wtt_arrival wtt_pass wtt_departure smart_arrival smart_pass smart_departure
1970-01-01 15:49:00.000 +0100 NULL NULL 2022-10-14 15:47:27.000 +0100 NULL NULL

Queried using pd.read_sql

wtt_arrival wtt_pass wtt_departure smart_arrival smart_pass smart_departure
1970-01-01 14:49:00+00:00 None None 2022-10-14 14:47:27+00:00 None None

Query by retrieving value directly (suggested by Gord Thompson below)

(datetime.datetime(1970, 1, 1, 14, 49, tzinfo=datetime.timezone.utc), None, None, datetime.datetime(2022, 10, 14, 14, 47, 27, tzinfo=datetime.timezone.utc), None, None)

Queried using psql (suggested by Adrian Klaver)

wtt_arrival wtt_pass wtt_departure smart_arrival smart_pass smart_departure
1970-01-01 14:49:00+00 2022-10-14 14:47:27+00

Results of running show search_path and \d *.scheduled_and_actual_timings

timetabletooldev=> \d *.scheduled_and_actual_timings
                    View "public.scheduled_and_actual_timings"
        Column        |           Type           | Collation | Nullable | Default 
----------------------+--------------------------+-----------+----------+---------
 wtt_arrival          | timestamp with time zone |           |          | 
 wtt_pass             | timestamp with time zone |           |          | 
 wtt_departure        | timestamp with time zone |           |          | 
 smart_arrival        | timestamp with time zone |           |          | 
 smart_pass           | timestamp with time zone |           |          | 
 smart_departure      | timestamp with time zone |           |          | 

timetabletooldev=> show search_path;
   search_path   
-----------------
 "$user", public
(1 row)
Ivan
  • 11
  • 3
  • 1) Your text descriptions don't match the data in your examples. For instance *Example B* has no value for `smart_departure`. 2) The return values are correct for the example data shown. The times are the same. 3) Postgres `timestamptz` does not store the timezone offset, the values are stored as UTC. – Adrian Klaver Feb 13 '23 at 16:08
  • @AdrianKlaver Thanks for your feedback, I have amended the question. If "The return values are correct for the example data shown", would you mind enlightening me as to how I can read smart_departure in example A correctly using pandas? – Ivan Feb 13 '23 at 16:15
  • The value is correct `select '2022-01-14 08:23:03.000 +0000'::timestamp; 01/14/2022 08:23:03`. `select '2022-01-14 08:23:03.000 +0000'::timestamptz; 01/14/2022 00:23:03 PST`. Use `psql` and show the results of selecting that row as update to your question. – Adrian Klaver Feb 13 '23 at 16:21
  • Also, please [edit] your question to show what happens when you retrieve that value directly, instead of via `read_sql()`. That would be something like `with engine.begin() as conn:` followed by `print(conn.exec_driver_sql("SELECT smart_departure FROM tbl WHERE id = 1").all())`. When I do that, I see `[(datetime.datetime(2022, 1, 14, 8, 23, 3, tzinfo=datetime.timezone.utc),)]` – Gord Thompson Feb 13 '23 at 16:37
  • @GordThompson I have update the answer with this information. Do you think this would mean that the issue is with how I am setting up my sqlalchemy connection? – Ivan Feb 13 '23 at 17:33
  • @AdrianKlaver I have updated the question with psql results.. appreciate your help – Ivan Feb 13 '23 at 17:41
  • To me it looks like you are connecting to either: 1) Different databases. Check connections properties to make sure they are the same for each query. 2) Different table in a database. In `psql` do a) `\d *.` and `show search_path` and update question with results. – Adrian Klaver Feb 13 '23 at 18:01
  • It seems strange that DBeaver is reporting a smart_departure value of `2022-01-14 08:23:03.000 +0000` when all other methods suggest that the value is actually `[NULL]`. The status bar at the bottom of the DBeaver window shows the time zone and the language. For example, mine shows "MST | en_CA". What does yours show? – Gord Thompson Feb 13 '23 at 18:14
  • @GordThompson GMT | en_GB – Ivan Feb 13 '23 at 18:55
  • @AdrianKlaver please see screenshot linked, It is the same database and table that I am connecting to. – Ivan Feb 13 '23 at 18:55
  • 1) Do not use screenshots, copy and paste the text into the question. 2) I do not see haoe dBeaver can see the value in the first case and the Python and psql queries do not unless you are looking at different tables. At this point I am going to say you are connecting to different databases, even though you say you are not. 3) Should I be wrong about 2) I would trust psql before I would trust dBeaver. In other words psql is telling you the truth and dBeaver is not. – Adrian Klaver Feb 13 '23 at 19:02
  • @AdrianKlaver 1) Changed 2) You're wrong 3) thanks – Ivan Feb 13 '23 at 19:09
  • @GordThompson I think the issue could be something to do with timezone. Using psql, it says I'm using UTC with is_dst = f. In DBeaver, it says I'm using GMT but is_dst is also unticked. – Ivan Feb 13 '23 at 19:16
  • FYI, for practical purposes UTC = GMT and neither will have Daylight Savings. In any case the timezone or whether is has DST will not affect whether it is displayed or not. If there is a value it will be shown. – Adrian Klaver Feb 13 '23 at 20:21
  • Just dawned on me that there is another scenario that might account for this: a) You opened session in dBeaver. b) You started a transaction and inserted or updated the record in the table. c) You have not committed the transaction. In that case dBeaver would see the values but other sessions e.g. Pandas, psql, psycopg2 would not. – Adrian Klaver Feb 13 '23 at 20:53

1 Answers1

1

Timezone in DBeaver session and psql/sqlalchemy sessions were different. When correct time zone is used, the same data is shown in both DBeaver and sqlalchemy.

Defining the engine like so worked for me:

engine = create_engine(DATABASE_URL, connect_args={"options": "-c timezone=Europe/London"})
Ivan
  • 11
  • 3