0

I'm trying to run a query saved in an SQL file that's encoded in UTF-8. My code is as follows:

import pyodbc
import pandas as pd

query = open('test.sql', 'r')

print(query.read())

server = 'server' 
database = 'database' 
username = 'username' 
password = 'password' 

cnxn = pyodbc.connect(
    'DRIVER={ODBC Driver 17 for SQL Server};' +
    'SERVER=' + server + ';DATABASE=' + database +
    ';UID=' + username + ';PWD='+ password
)

df = pd.read_sql_query(query.read(), cnxn)
query.close()

print(df.head())

print(query.read()) works fine and correctly shows the contents of test.sql, but read_sql_query gives the following error:

pyodbc.Error: ('HY090', '[HY090] [Microsoft][ODBC Driver Manager] Invalid string or buffer length (0) (SQLExecDirectW)')

If I change the encoding of the SQL file from UTF-8 to ANSI in Notepad++ before running, the code works fine with no error. How can I make this work without changing the encoding of the SQL file beforehand? I think maybe I should use .encode() or .decode() but I can't figure out how.

Dale K
  • 25,246
  • 15
  • 42
  • 71
34293045
  • 21
  • 3
  • possibly related: https://stackoverflow.com/a/44573867/2144390 – Gord Thompson Aug 26 '21 at 17:58
  • Thanks. I tried changing query.read() to query.read().decode('utf-8-sig') and got "AttributeError: 'str' object has no attribute 'decode'". How would I use .decode() in this case? – 34293045 Aug 26 '21 at 18:06
  • You could try encoding the string first, then re-decoding `query.read().encode("utf-8").decode()` – Caleb Courtney Aug 26 '21 at 18:08
  • Are there any characters which cannot be converted to ANSI? Are you sure it's UTF-8 and not UTF-16? – Charlieface Aug 26 '21 at 18:12
  • Sorry, I said decode, then re-encode, when I meant the other way around. Ultimately, it looks like there is bad encoding in the query somewhere (maybe a new line character?), so that's what you want to get rid of. – Caleb Courtney Aug 26 '21 at 18:12
  • Just tried both `query.read().encode("utf-8").decode("ascii")` and `query.read().encode("ascii").decode("utf-8")`, and the "AttributeError: 'str' object has no attribute 'decode'" goes away, but now I'm back to the original error "Invalid string or buffer length (0)". The only text in test.sql is the following (no newlines): `SELECT TOP 1000 * FROM My_Table ORDER BY NEWID()`. Not sure what the issue could be. – 34293045 Aug 26 '21 at 18:23

1 Answers1

2

Ok, problem solved, and I feel stupid. I was trying to use .read() twice--first in print() and then again in read_sql_query(). I just commented out the print(query.read()) and it worked fine. So it wasn't an encoding issue after all.

When I had previously encoded the file in ANSI using Notepad++ and got it to work, I must not have had print(query.read()) first. Sorry about that, this is my first program using open() and read().

Frodnar
  • 2,129
  • 2
  • 6
  • 20
34293045
  • 21
  • 3
  • And now I feel dumb too! Of course, you read the file, and so there's nothing left to read. So the string is length 0 when it's sent via the query. – Caleb Courtney Aug 26 '21 at 19:53