17

Given:

CREATE PROCEDURE my_procedure
    @Param INT
AS
    SELECT Col1, Col2
    FROM Table
    WHERE Col2 = @Param

I would like to be able to use this as:

import pandas as pd
import pyodbc

query = 'EXEC my_procedure @Param = {0}'.format(my_param)
conn = pyodbc.connect(my_connection_string)

df = pd.read_sql(query, conn)

But this throws an error:

ValueError: Reading a table with read_sql is not supported for a DBAPI2 connection. Use an SQLAlchemy engine or specify an sql query

SQLAlchemy does not work either:

import sqlalchemy
engine = sqlalchemy.create_engine(my_connection_string)
df = pd.read_sql(query, engine)

Throws:

ValueError: Could not init table 'my_procedure'

I can in fact execute the statement using pyodbc directly:

cursor = conn.cursor()
cursor.execute(query)
results = cursor.fetchall()
df = pd.DataFrame.from_records(results)

Is there a way to send these procedure results directly to a DataFrame?

joeb1415
  • 527
  • 2
  • 7
  • 12
  • 3
    What version of pandas are you using? And can you try to use `pd.read_sql_query` instead of `read_sql`? (there was a bug in `read_sql` regarding executing stored procedures) – joris Oct 01 '14 at 08:06

4 Answers4

11

Use read_sql_query() instead.

Looks like @joris (+1) already had this in a comment directly under the question but I didn't see it because it wasn't in the answers section.

Use the SQLA engine--apart from SQLAlchemy, Pandas only supports SQLite. Then use read_sql_query() instead of read_sql(). The latter tries to auto-detect whether you're passing a table name or a fully-fledged query but it doesn't appear to do so well with the 'EXEC' keyword. Using read_sql_query() skips the auto-detection and allows you to explicitly indicate that you're using a query (there's also a read_sql_table()).

import pandas as pd
import sqlalchemy

query = 'EXEC my_procedure @Param = {0}'.format(my_param)
engine = sqlalchemy.create_engine(my_connection_string)
df = pd.read_sql_query(query, engine)
steamer25
  • 9,278
  • 1
  • 33
  • 38
9

This worked for me after added SET NOCOUNT ON thanks @CRAFTY DBA

sql_query = """SET NOCOUNT ON; EXEC db_name.dbo.StoreProc '{0}';""".format(input)

df = pandas.read_sql_query(sql_query , conn)
as - if
  • 2,729
  • 1
  • 20
  • 26
  • 2
    I had to do this as well. In order to get my query to work i had to create the string with triple quotes using single quotes around my stored procedure's variables. Looked like this `"""EXEC sProcedure 'variable1', 'variable2'"""` – Daniel Butler Mar 26 '19 at 19:36
7

https://code.google.com/p/pyodbc/wiki/StoredProcedures

I am not a python expert, but SQL Server sometimes returns counts for statement executions. For instance, a update will tell how many rows are updated.

Just use the 'SET NO COUNT;' at the front of your batch call. This will remove the counts for inserts, updates, and deletes.

Make sure you are using the correct native client module.

Take a look at this stack overflow example.

It has both a adhoc SQL and call stored procedure example.

Calling a stored procedure python

Good luck

Community
  • 1
  • 1
CRAFTY DBA
  • 14,351
  • 4
  • 26
  • 30
  • `SET NOCOUNT ON` unfortunately doesn't help in this case. – joeb1415 Oct 01 '14 at 16:58
  • 1
    Adding SET NOCOUNT ON at the start of my stored procedure worked. Thanks. Also, I tried adding "SET NOCOUNT ON; " to the start of my query and that worked as well. – FistOfFury Mar 13 '17 at 14:48
2

Using ODBC syntax for calling stored procedures (with parameters instead of string formatting) works for loading dataframes using pandas 0.14.1 and pyodbc 3.0.7. The following examples use the AdventureWorks2008R2 sample database.

First confirm expected results calling the stored procedure using pyodbc:

import pandas as pd
import pyodbc
connection = pyodbc.connect(driver='{SQL Server Native Client 11.0}', server='ServerInstance', database='AdventureWorks2008R2', trusted_connection='yes')
sql = "{call dbo.uspGetEmployeeManagers(?)}"
params = (3,)
cursor = connection.cursor()
rows = cursor.execute(sql, params).fetchall()
print(rows)

Should return:

[(0, 3, 'Roberto', 'Tamburello', '/1/1/', 'Terri', 'Duffy'), (1, 2, 'Terri', 'Duffy',
'/1/', 'Ken', 'Sánchez')]

Now use pandas to load the results into a dataframe:

df = pd.read_sql(sql=sql, con=connection, params=params)
print(df)

Should return:

   RecursionLevel  BusinessEntityID FirstName    LastName OrganizationNode  \
0               0                 3   Roberto  Tamburello            /1/1/
1               1                 2     Terri       Duffy              /1/

  ManagerFirstName ManagerLastName
0            Terri           Duffy
1              Ken         Sánchez

EDIT

Since you can't update to pandas 0.14.1, load the results from pyodbc using pandas.DataFrame.from_records:

# get column names from pyodbc results
columns = [column[0] for column in cursor.description]
df = pd.DataFrame.from_records(rows, columns=columns)
Bryan
  • 17,112
  • 7
  • 57
  • 80
  • This does not work in pandas 0.14.0. I'm unable at present to upgrade to 0.14.1 to test there, but thanks for the tip. – joeb1415 Oct 01 '14 at 16:54
  • @joeb1415 Updated on how to populate dataframe without using `read_sql` – Bryan Oct 01 '14 at 17:13
  • Minor, you probably want to ``from_records(rows, columns=columns, coerce_float=True)`` so that you don't end up with stringed numeric columns, which is often the case with oracle/sql – JSharm Mar 05 '21 at 09:51