0

I have a function which, when passed database, table and access details connects to a table in SQL server to read all the contents to export to a pandas dataframe

def GET_DATA(source_server, source_database, source_table, source_username, source_password):

    print('******* GETTING DATA ' ,source_server, '.', source_database,'.' ,source_table,'.' ,source_username , '*******')
    data_collected = []

    #SOURCE 
    connection = pypyodbc.connect('Driver={ODBC Driver 17 for SQL Server};'
                                'Server=' + source_server + ';'
                                'Database=' + source_database + ' ;'
                                'uid=' + source_username + ';pwd=' + source_password + '')


    #OPEN THE CONNECTION 
    cursor = connection.cursor()

    #BUILD THE COMMAND 
    SQLCommand = ("SELECT * FROM " + source_database +".dbo." + source_table )


    #RUN THE QUERY 
    cursor.execute(SQLCommand)

    #GET RESULTS 
    results = cursor.fetchone()

    columnList = [tuple[0] for tuple in cursor.description]
    #print(type(columnList))

    while results: 

        data_collected.append(results)
        results = cursor.fetchone()

    df_column = pd.DataFrame(columnList)
    df_column = df_column.transpose()
    df_result = pd.DataFrame(data_collected)
    frames = [df_column,df_result]

    df = pd.concat(frames)
    print('GET_DATA COMPLETE!')

    return df

Most of the time this works fine, however, for reasons I can't identify I get this error

sequence item 0: expected str instance, bytes found

What is causing this and how do I account for it?

thx !

the_good_pony
  • 490
  • 5
  • 12

3 Answers3

1

I found a much better way of extracting data from SQL to pandas

import pyodbc 
import pandas as pd

def GET_DATA_TO_PANDAS(source_server,source_database, source_table,source_username,source_password):
        print('***** STARTING DATA TO PANDAS ********* ')
        con = pyodbc.connect('Driver={ODBC Driver 17 for SQL Server};'
                                    'Server=' + source_server + ';'
                                    'Database=' + source_database + ' ;'
                                    'uid=' + source_username + ';pwd=' + source_password + '')




        #BUILD QUERY 
        query = "SELECT * FROM " + source_database + ".dbo." + source_table
        df = pd.read_sql(query, con)
        return df 

Used this link - https://www.quora.com/How-do-I-get-data-directly-from-databases-DB2-Oracle-MS-SQL-Server-into-Pandas-DataFrames-using-Python

the_good_pony
  • 490
  • 5
  • 12
0

I experienced a similar issue in one of my projects. This exception was raised by microsoft ODBC driver. According to me the issue might have occurred while fetching the results from the DB. May be at line

cursor.fetchone()

The reason for this exception as of what I understood before, is the size of the data that is received from SQL Server to Python. There might be one specific huge row in the DB that's causing this. If the row has unicode characters or non-ascii characters, the driver exceeds the buffer length, the driver cannot convert the nvarchar to bytes and from bytes object back to string. When the driver encounters some special characters, it sometimes cannot convert the bytes object back to string and hence the error. The driver sends a bytes object back to python. I think that's the reason for the exception.

May be if you dip a bit deep into that specific data row that might help you.

I also found another similar issue here - Click here

May be this URL (Microsoft ODBC driver's known issue) might help too - Click here

Sreekhar
  • 130
  • 9
0

I got the same error using python 3, as follows: I defined a MS SQL column as nchar, stored an empty string (which in python 3 is unicode), then retrieved the row with the pypyodbc call cursor.fetchone(). It failed on this line:

if raw_data_parts != []:
     if py_v3:
         if target_type != SQL_C_BINARY:
             raw_value = ''.join(raw_data_parts) 
             # FAILS WITH "sequence item 0: expected str instance, bytes found"
      ....

Changing the column datatype to nvarchar in the database fixed it.

Emilia Apostolova
  • 1,719
  • 15
  • 18