1

I'm querying some data from SQL Server into Python using pyodbc and storing the results in a pandas dataframe. One of the resulting columns should be a date datatype but it is coming over as a object. I am trying to figure out how I can work with the dates in this column.

I was able to convert all columns to string and clean data, was able to convert quantity_picked to a float successfully, but was not able to successfully convert date_expected column to datetime64 (as suggested on other sites online) before OR after converting it to a string.

Error received:

NameError: name 'datetime64' is not defined

I have also tried simply converting the results using pd.to_datetime() and I receive the error below:

TypeError: <class 'pyodbc.Row'> is not convertible to datetime

I have had trouble finding any documentation on how to specify the data type while querying using pyodbc.

If I have missed something obvious, please excuse me, I am a new python user.

I am unsure if I should be worrying about getting this in a proper date datatype while querying, after querying and before cleaning, or after cleaning, or if it is even possible. If it is not possible, any guidance on how to deal with dates during analysis in python is appreciated.

This is how I am pulling the data from SQL Server to Python and putting it into a dataframe:

cursor = sql_conn.cursor()
SQL_query1 = cursor.execute('SELECT item_number FROM [Data].[MCF].[FULL ORDER DETAIL]')

pratice_df = pd.DataFrame()

practice_df1 = pd.DataFrame(SQL_query1, columns=['item_number']) #imports all values from query as one column

SQL_query2 = cursor.execute('SELECT quantity_picked FROM [Data].[MCF].[FULL ORDER DETAIL]')
practice_df2 = pd.DataFrame(SQL_query2, columns=['quantity_picked'])

SQL_query3 = cursor.execute('SELECT date_expec FROM [Data].[MCF].[FULL ORDER DETAIL]')
practice_df3 = pd.DataFrame(SQL_query3, columns=['date_expected'])

#combine separate dataframes into one dataframe with corresponding values
practice_df = pd.concat([practice_df1, practice_df2, practice_df3], axis=1)

Had trouble getting pyodbc to bring over data without putting all 3 columns as one column in a dataframe, so I brought columns over and entered into individual dataframes, and then merged them.

This is how the data is displayed (after cleaning):

0      0001111041660              1.0    2019-05-27
1      0001111046070              2.0    2019-05-27

If I enter practice_df.info() currently, I receive:

RangeIndex: 55676 entries, 0 to 55675
Data columns (total 3 columns):
item_number        55676 non-null object
quantity_picked    55676 non-null float64
date_expected      55676 non-null object
dtypes: float64(1), object(2)

Any guidance is much appreciated. Thank you.

Emily Reed
  • 65
  • 10
  • Not sure if this is the problem but `date_expec ` is different to `date_expected` – Juan Carlos Oropeza Jun 18 '19 at 14:35
  • your 3 separate queries are not in any particular order (no ORDER BY ) therefore you cannot guarantee that the 3 lists are ordered the same - the hazard is that they may well appear to be most of the time - that's not the immediate date problem though - (that's as far as I can understand it) – Cato Jun 18 '19 at 14:40
  • I don't believe that's causing an issue since the data is coming over and being stored in the dataframe. I didn't like the name the column had in SQL (date_expec) so I renamed it to date_expected for the dataframe. That shouldn't cause an issue as long as I refer to the appropriate name at the appropriate times, correct? – Emily Reed Jun 18 '19 at 14:41
  • @Cato, thank you very much for the feedback, that is a good point, I will adjust that! – Emily Reed Jun 18 '19 at 14:42
  • 1
    If `date_expec` is a `[n]varchar` column then perhaps try `SELECT CAST(date_expec AS datetime2) AS date_exp_date...` – Gord Thompson Jun 18 '19 at 14:50
  • @Gord Thompson, turns out adjusting my query accordingly adjusts the resulting values in the column to the following format: [2019-05-27 00:00:00.0000000], but it still comes over as an "object" instead of a date data type. – Emily Reed Jun 18 '19 at 15:43
  • I just tried `sql = "SELECT item_number, quantity_picked, CAST(date_expec AS datetime2) AS date_expected FROM [FULL ORDER DETAIL]"` followed by `practice_df = pd.read_sql_query(sql, engine)` and it worked for me. My `date_expected` column is `datetime64[ns]`. – Gord Thompson Jun 18 '19 at 16:02
  • @Gord, after making a few more adjustments and following what you suggested it finally worked. Thank you! – Emily Reed Jun 18 '19 at 17:58

1 Answers1

0

So we have a SQL Server table with a text column containing 'yyyy-mm-dd' date values. If we simply retrieve that column as-is those values will be stored in the DataFrame as strings:

sql = """\
SELECT 
    item_number, 
    quantity_picked, 
    date_expec AS date_expected 
FROM [FULL ORDER DETAIL]"""

practice_df = pd.read_sql_query(sql, engine)

print(practice_df)
#      item_number  quantity_picked date_expected
# 0  0001111041660                1    2019-05-27
# 1  0001111046070                2    2019-05-27

sample_date = practice_df.at[0, 'date_expected']
print(type(sample_date))  # <class 'str'>
print(repr(sample_date))  # '2019-05-27'

However, if we use CAST(date_expec AS datetime2) in the query we get Timestamp values instead

sql = """\
SELECT 
    item_number, 
    quantity_picked, 
    CAST(date_expec AS datetime2) AS date_expected 
FROM [FULL ORDER DETAIL]"""

practice_df = pd.read_sql_query(sql, engine)

print(practice_df)
#      item_number  quantity_picked date_expected
# 0  0001111041660                1    2019-05-27
# 1  0001111046070                2    2019-05-27

sample_date = practice_df.at[0, 'date_expected']
print(type(sample_date))  # <class 'pandas._libs.tslibs.timestamps.Timestamp'>
print(repr(sample_date))  # Timestamp('2019-05-27 00:00:00')
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418