0

SET UP

MWE: I have a table in SQL Server as such

CREATE TABLE dbo.MyTable(
    order_id INT IDENTITY(1,1),
    column2 DECIMAL,
    column3 INT
    PRIMARY KEY(order_id)
)

I am using pyodbc to insert some data in the form of a pandas.DataFrame into the table. I am using data such as:

   column2  column3
0     1.23        5
1     4.95        9
2     6.79       10

Where I've created this example dataframe using

 data = pd.DataFrame({'column2':[1.23, 4.95, 6.79], 'column3':[5,9,10]})

I use the following statement to insert data

stmt = "INSERT INTO dbo.MyTable(column2, column3) OUTPUT Inserted.order_id VALUES (?, ?)"

ISSUE

This is the code that I use to insert everything and returning the values:

# Set up connection and create cursor
conn_string = "DRIVER={MyDriver};SERVER=MyServer;DATABASE=MyDb;UID=MyUID;PWD=MyPWD"
cnxn = pyodbc.connect(conn_string)
cnxn.autocommit = False
cursor = cnxn.cursor()
cursor.fast_executemany = True
# Upload data
cursor.executemany(stmt, data.values.tolist())
# Process the result
try:
    first_result = cursor.fetchall()
except pyodbc.ProgrammingError:
    first_result = None
result_sets = []
while cursor.nextset():
    result_sets.append(cursor.fetchall())
all_inserted_ids = np.array(result_sets).flatten()

However, I do not get all the ids that I should get! For instance, suppose there is no data in the table, I will not get

all_inserted_ids = np.array([1, 2, 3])

But rather I will only get

all_inserted_ids = np.array([2, 3])

Which means that I'm losing the first id somewhere!

And notice that first_result never works. It always throws the following:

pyodbc.ProgrammingError: No results.  Previous SQL was not a query.

I've also tried using cursor.fetchone(), cursor.fetchone()[0] or cursor.fetchval() but they got me the same error.

METHODS THAT I TRIED BUT DID NOT WORK

1) Adding "SET NOCOUNT ON"

I tried using the same code as in the question but with

stmt = 
"""
SET NOCOUNT ON; 
INSERT INTO dbo.MyTable(column2, column3) 
OUTPUT Inserted.order_id 
VALUES (?, ?)
"""

The output was [1, 2] so I was missing 3.

2) Adding "SET NOCOUNT ON" and inserting output to table variable

I used the following statement:

stmt = 
"""
SET NOCOUNT ON; 
DECLARE @NEWID TABLE(ID INT); 
INSERT INTO dbo.MyTable(column2, column3) 
OUTPUT Inserted.order_id INTO @NEWID(ID) 
VALUES (?, ?) 
SELECT ID FROM @NEWID
"""

Again this didn't work as I obtained only '[2, 3]' but no '1'.

3) Selecting the @@IDENTITY

I used the following statement:

stmt = 
"""
INSERT INTO dbo.MyTable(column2, column3) 
OUTPUT Inserted.order_id 
VALUES (?, ?)
SELECT @@IDENTITY
"""

But it didn't work as I obtained array([Decimal('1'), 2, Decimal('2'), 3, Decimal('3')]

4) Selecting @@IDENTITY with SET NOCOUNT ON

I used

stmt = 
"""
SET NOCOUNT ON
INSERT INTO dbo.MyTable(column2, column3) 
OUTPUT Inserted.order_id
VALUES (?, ?);
SELECT @@IDENTITY
"""

but I got array([Decimal('1'), 2, Decimal('2'), 3, Decimal('3')], dtype=object) again.

5) Selecting @@IDENTITY without using OUTPUT

I used:

stmt = 
"""
INSERT INTO dbo.MyTable(column2, column3) 
VALUES (?, ?);
SELECT @@IDENTITY
"""

But I got [Decimal('2') Decimal('3')]

6) Selecting @@IDENTITY without using OUTPUT but with SET NOCOUNT ON

I used:

stmt = 
"""
SET NOCOUNT ON
INSERT INTO dbo.MyTable(column2, column3) 
VALUES (?, ?);
SELECT @@IDENTITY
"""

But again I got: [Decimal('2') Decimal('3')]

A possible way around this, which is really bad, but does the job

A possible way is to create a new table where we'll store the ids and truncate it once we're done. It is horrible but I couldn't find any other solution..

Create a table:

CREATE TABLE NEWID(
    ID INT
    PRIMARY KEY (ID)
)

Next this is the complete code:

import pyodbc
import pandas as pd
import numpy as np
# Connect
conn_string = """
DRIVER={MYDRIVER};
SERVER=MYSERVER;
DATABASE=DB;
UID=USER;
PWD=PWD
"""
cnxn = pyodbc.connect(conn_string)
cnxn.autocommit = False
cursor = cnxn.cursor()
cursor.fast_executemany = True
# Data, Statement, Execution
data = pd.DataFrame({'column2': [1.23, 4.95, 6.79], 'column3': [5, 9, 10]})
stmt = """
INSERT INTO dbo.MyTable(column2, column3) 
OUTPUT Inserted.order_id INTO NEWID(ID)
VALUES (?, ?);
"""
cursor.executemany(stmt, data.values.tolist())
cursor.execute("SELECT ID FROM NEWID;")
# Get stuff
try:
    first_result = cursor.fetchall()
except pyodbc.ProgrammingError:
    first_result = None
result_sets = []
while cursor.nextset():
    result_sets.append(cursor.fetchall())
all_inserted_ids = np.array(result_sets).flatten()
print('First result: ', first_result)
print('All IDs: ', all_inserted_ids)
cursor.commit()
# Remember to truncate the table for next use
cursor.execute("TRUNCATE TABLE dbo.NEWID;", [])
cursor.commit()

This will return

First result:  [(1, ), (2, ), (3, )]
All IDs:  []

So we just keep the first result.

Euler_Salter
  • 3,271
  • 8
  • 33
  • 74
  • Looking here: https://stackoverflow.com/questions/30058957/sql-server-return-value-with-pyodbc it seems like adding "SET NOCOUNT ON" could fix it. However it doesn't. If you run my minimal working example above with this addition to `stmt` you will get `all_inserted_ids = np.array([1, 2])`, so now I am losing the last id! – Euler_Salter Jun 12 '18 at 08:59
  • Another solution that I read here: https://stackoverflow.com/questions/47188976/how-to-get-the-identity-value-when-using-insert-output-with-pyodbc would be to use the following statement `stmt = """SET NOCOUNT ON; DECLARE @NEWID TABLE(ID INT); INSERT INTO dbo.MyTable(column2, column3) OUTPUT Inserted.order_id INTO @NEWID(ID) VALUES (?, ?) SELECT ID FROM @NEWID"""` However again here I obtain an error on the first result set and then `all_inserted_ids = np.array([2, 3])`. So this doesn't work either – Euler_Salter Jun 12 '18 at 09:22
  • A third possibility read here: https://stackoverflow.com/questions/44777348/returning-primary-key-on-insert-with-pyodbc says that I could use "SELECT @@IDENTITY". So I adapted the statement and now I use `stmt = """INSERT INTO dbo.MyTable(column2, column3) OUTPUT Inserted.order_id VALUES (?, ?) SELECT @@IDENTITY"""`. This seems to work! However I still need some testing. But, for now I get the first_result None and `all_inserted_ids = np.array([1, 2, 3])`. – Euler_Salter Jun 12 '18 at 09:28
  • Regarding my last comment, I think it is "working" but just because some bugs are coming together to make it work this way... So it probably isn't quite the correct solution. I've also noticed that if you omit `"OUTPUT Inserted.order_id"` and only use `stmt = """INSERT INTO dbo.MyTable(column2, column3) VALUES (?, ?) SELECT @@IDENTITY"""` then you incur in the same problem above, because you get `all_inserted_ids = np.array([2, 3])` – Euler_Salter Jun 12 '18 at 09:32
  • Actually, using "SELECT @@IDENTITY" doesn't work. I misread the output. What I get is actually: Actually, this doesn't work! I only get `array([Decimal('1'), 2, Decimal('2'), 3, Decimal('3')], dtype=object)` I misread the output! – Euler_Salter Jun 12 '18 at 09:35

1 Answers1

1

I have implemented a method similar to your method 1) using sqlAlchemy with the pyodbc dialect. It can easily be adapted to the pyodbc library directly. The trick was to had a SELECT NULL; before the Insert query. This way the first OUTPUT of the insert query will be in the returned sets. Using this method if you inserted n rows you will need to fetch 2n-1 sets using the cursor's nextset(). This is a patch because either MSSQL or pyodbc discards the first set. I wonder if there is an option is MSSQL server or pyodbc where you could specify to return the first set.

from sqlalchemy.orm import Session
from sqlalchemy.sql.expression import TableClause

def bulk_insert_return_defaults_pyodbc(
    session: Session, statement: TableClause, parameters: List[dict], mapping: dict
):
    """

    Parameters
    ----------
    session:
        SqlAlchemy Session object
    statement:
        SqlAlchemy table clause object (ie. Insert)
    parameters:
        List of parameters
        ex: [{"co1": "value1", "col2": "value2"}, {"co1": "value3", "col2": "value4"}]
    mapping
        Mapping between SqlAlchemy declarative base attribute and name of column in 
        database

    Returns
    -------

    """
    if len(parameters) > 0:
        connexion = session.connection()
        context = session.bind.dialect.execution_ctx_cls._init_statement(
            session.bind.dialect,
            connexion,
            connexion._Connection__connection.connection,
            statement,
            parameters,
        )
        statement = context.statement.compile(
            session.bind, column_keys=list(context.parameters[0].keys())
        )
        session.bind.dialect.do_executemany(
            context.cursor,
            "SELECT NULL; " + str(statement),
            [
                tuple(p[p_i] for p_i in statement.params.keys())
                for p in context.parameters
            ],
            context,
        )
        results = []

        while context.cursor.nextset():
            try:
                result = context.cursor.fetchone()
                if result[0] is not None:
                    results.append(result)
            except Exception:
                continue

        return [
            {mapping[r.cursor_description[i][0]]: c for i, c in enumerate(r)}
            for r in results
        ]
    else:
        return []

multi_params = bulk_insert_return_defaults_pyodbc(
    session,
    table_cls.__table__.insert(returning=[table_cls.id]),
    multi_params,
    {
        getattr(table_cls, c).expression.key: c
        for c in list(vars(table_cls))
        if isinstance(getattr(table_cls, c), InstrumentedAttribute)
    },
)