I am using Pandas "read_sql" to pull a data set from SQL Server into a dataframe, using Pypyodbc. However, it looks like sometimes (NOT ALWAYS), when there are NULLs in a field, the datatype isn't int64, but float64.
I have two fields that are both declare as INT in SQL Server. One sometimes has NULLs, the other one appears to always have NULLs.
Here's the schema in SQL Server:
[PLAN_ID] [int] NULL,
[DESTINATION_ID] [int] NULL,
[STORE_ID] [int] NULL,
If I use dict to look at those fields, I see: (there's others, but I'm not sure how to read DICT, so I'm including the previous line)
Name: plan_id, Length: 13193, dtype: int64, 'destination_id': 0 None
1 None
2 None
3 None
4 None
...
13188 None
13189 None
13190 None
13191 None
13192 None
Name: dest_id, Length: 13193, dtype: object, 'store_id': 0 175635.0
1 180942.0
2 NaN
3 NaN
4 NaN
...
13188 59794.0
13189 180015.0
13190 94819.0
13191 184716.0
13192 182301.0
Name: store_id, Length: 13193, dtype: float64, 'version': 0
Here's the code I'm using
import pandas as pd
import pypyodbc
from datetime import timedelta, date
start_date = date(2019, 5, 1)
end_date = date(2019, 5, 2)
daterange = pd.date_range(start_date, end_date)
con_string = ('Driver={SQL Server};'
'Server=mysqlservername;'
'Database=mydbname;'
'App=PythonPull;' #It's not "application name"!
'Trusted_Connection=yes')
cnxn = pypyodbc.connect(con_string)
for single_date in daterange:
datestr = single_date.strftime("%Y-%m-%d")
print(datestr)
tablelist = ["mytablenamehere"]
for item in tablelist:
query = f"""
declare @start_date datetime = '{datestr}'
declare @end_date datetime = dateadd(day,1,'{datestr}')
SELECT id, customerid FROM mydbname.dbo.{item} with (nolock)
where submitted >= @start_date and submitted < @end_date
order by submitted
"""
result_list = pd.read_sql(query, cnxn)
#at this point, running result_port_map.__dict__ shows that the ID is an int64, but the customerid is a float64