Update, April 2021:
This remains a known issue with pandas .to_sql()
Original Answer (Warning: Includes outdated SQLAlchemy 1.3 usage patterns deprecated in version 1.4)
There have been some very recent (as in "this morning") improvements to datetimeoffset
handling in SQLAlchemy. They will be included in the next release (probably 1.3.13), but in the meantime try installing from the latest source for the 1.3.x branch ...
pip install --upgrade git+https://github.com/sqlalchemy/sqlalchemy@rel_1_3
... and see if that works better for you.
Edit:
On further investigation it appears that the issue lies with to_sql
. If the DataFrame contains a single row then the timezone offset is lost:
import datetime
from pprint import pprint
import sqlalchemy as sa
# ...
engine = sa.create_engine(connection_uri, fast_executemany=True)
# test environment
table_name = 'DateTimeOffset_Test'
engine.execute(sa.text(f"DROP TABLE IF EXISTS [{table_name}]"))
engine.execute(sa.text(f"CREATE TABLE [{table_name}] (id int primary key, dto datetimeoffset)"))
# test data
my_tz = datetime.timezone(datetime.timedelta(hours=-7))
dto_value = datetime.datetime(2020, 1, 1, 0, 0, 0, tzinfo=my_tz)
print(dto_value) # 2020-01-01 00:00:00-07:00
# ^
num_rows = 1
row_data = [(x, dto_value) for x in range(num_rows)]
df = pd.DataFrame(row_data, columns=['id', 'dto'])
print(df)
# id dto
# 0 0 2020-01-01 00:00:00-07:00
# ^
df.to_sql(table_name, engine, if_exists='append', index=False)
result = engine.execute(sa.text(f"SELECT id, CAST(dto as varchar(50)) AS foo FROM [{table_name}]")).fetchall()
pprint(result)
# [(0, '2020-01-01 00:00:00.0000000 +00:00')]
# ^ -- wrong
However, if the DataFrame contains more than one row then the datetimeoffset values are uploaded correctly:
# ...
num_rows = 2
row_data = [(x, dto_value) for x in range(num_rows)]
df = pd.DataFrame(row_data, columns=['id', 'dto'])
print(df)
# id dto
# 0 0 2020-01-01 00:00:00-07:00
# 1 1 2020-01-01 00:00:00-07:00
# ^
df.to_sql(table_name, engine, if_exists='append', index=False)
result = engine.execute(sa.text(f"SELECT id, CAST(dto as varchar(50)) AS foo FROM [{table_name}]")).fetchall()
pprint(result)
# [(0, '2020-01-01 00:00:00.0000000 -07:00'),
# (1, '2020-01-01 00:00:00.0000000 -07:00')]
# ^ -- correct
If you really feel strongly about this you might want to raise a pandas issue about it.