4

When I pull a "Date" variable from SQL Server into Python/Pandas, it comes through as an 'Object'. I have installed and tried several drivers (commented drivers tried shown in the code), each time with the same results:

import pandas as pd
import pyodbc

conn_str = (
    r'Driver={SQL Server Native Client 11.0};'
    # r'Driver={SQL Server Native Client 10.0};'
    # r'Driver={ODBC Driver 11 for SQL Server};'
    # r'Driver={ODBC Driver 13 for SQL Server};'
    # r'Driver={SQL Server};'
    r'Server=MyServer;'
    r'Database=MyDB;'
    r'Trusted_Connection=yes;'
    )

cnxn = pyodbc.connect(conn_str)

sql = (
    "Select cast('2017-08-19' as date) [DateVar]"
    ", cast('2017-08-19' as datetime) [DateTimeVar]"
    ", cast('2017-08-19' as datetime2) [DateTime2Var]"
    )

d2 = pd.read_sql(sql,cnxn)

cnxn.close()

print(d2.dtypes)

Returned result is:

DateVar                 object
DateTimeVar     datetime64[ns]
DateTime2Var    datetime64[ns]
dtype: object

I want that DateVar to be a datetime. Any ideas why this is happening??

Same issue as this guy: pyodbc returns SQL Server DATE fields as strings But the fix for him was to use {SQL Server Native Client 10.0} which I've installed and isn't working for me.

Version of SQL Server I'm connecting to is:

Microsoft SQL Server 2012 (SP3) (KB3072779) - 11.0.6020.0 (X64) 
Oct 20 2015 15:36:27 
Copyright (c) Microsoft Corporation
Enterprise Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1)

UPDATES

1>

Based on Max's input, have tried sqlalchemy, but no luck, this still gives me a string back:

import sqlalchemy as sa
engine = sa.create_engine('mssql+pyodbc://MyDatabase/MyDB?driver=SQL+Server+Native+Client+10.0')
d2 = pd.read_sql(sql, engine)

2>

Based on Flipper's Q, have done this with just a Pyodbc cursor and it looks like the proper date data type is being returned in the cursor when using the Native Client 11.0:

(('DateVar', datetime.date, None, 10, 10, 0, True),
 ('DateTimeVar', datetime.datetime, None, 23, 23, 3, True),
 ('DateTime2Var', datetime.datetime, None, 27, 27, 7, True))

This would suggest the issue is in Pandas handling of the dtype datetime.date when loading into a dataframe.

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
Silas
  • 144
  • 1
  • 8
  • 2
    Just to triangulate, can you try the SQL query using a `pyodbc` cursor and look at the types without `pandas` in the mix? – FlipperPA Aug 19 '17 at 17:19
  • Looks like it is working with a cursor and returns `datetime.date` data type. This suggests the issue is with pandas handling `datetime.date`, whereas it seems to handle `datetime.datetime` just fine. Nice thinking. Any ideas on how to troubleshoot the pandas? Converting a pyodbc cursor to a pandas dataframe doesn't seem to have many search results. Everyone just says use `pandas.read_sql()`. – Silas Aug 19 '17 at 19:14
  • 1
    That's what I was afraid of; SQL Server support gets more sparse as you add more packages to the top of the stack. I'm not too familiar with pandas, but perhaps dig into the pandas source a bit if you're comfortable? – FlipperPA Aug 19 '17 at 21:21
  • 1
    Side note on the SQL Server ODBC providers, new development should use the providers prefixed with `ODBC Driver...`. Native client providers are deprecated, more details from the pyodbc maintainers [here](https://github.com/mkleehammer/pyodbc/wiki/Connecting-to-SQL-Server-from-Windows). – Bryan Aug 21 '17 at 13:24

2 Answers2

3

Use the parse_dates parameter of pandas.read_sql to specify that DateVar column values are explicitly converted to datetime on dataframe load.

Updated original code snippet:

...
d2 = pd.read_sql(sql=sql,
                 con=cnxn,
                 # explicitly convert DATE type to datetime object
                 parse_dates=["DateVar"])

cnxn.close()

print(d2.dtypes)

Returns

DateVar         datetime64[ns]
DateTimeVar     datetime64[ns]
DateTime2Var    datetime64[ns]
dtype: object

Tested with pyodbc 4.0.17, pandas 0.20.3, and SQL Server 2014 on Windows.

Bryan
  • 17,112
  • 7
  • 57
  • 80
  • Thanks, this is what I've been doing for now, but just adds work since I have a lot of "date" fields in my analyses and databases. Maybe in future releases Pandas will be able to handle the "date" data type. – Silas Aug 21 '17 at 18:18
0

Try to use SQLAlchemy as follows:

from sqlalchemy import create_engine

engine = create_engine("mssql+pyodbc://scott:tiger@myhost:port/databasename?driver=SQL+Server+Native+Client+10.0")
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419
  • Thanks Max. Tried this but getting the same issue still: engine = sa.create_engine('mssql+pyodbc://MyServer/MyDB?driver=SQL+Server+Native+Client+10.0') and then: d2 = pd.read_sql(sql, engine) This is still returning the Date as a string/object data type. – Silas Aug 19 '17 at 16:52
  • @Silas, sorry about that! Currently i don't have access to SQL Server so i can't test it myself... – MaxU - stand with Ukraine Aug 19 '17 at 17:04
  • SQLA is probably overkill for OP. – hd1 Aug 21 '17 at 13:34