How to fast insert data from Python 3 to SQL Server?

Question

I'm having issues quickly inserting large volumes of data from Python3 into SQL Server.

The target table has 9 columns with 3 indexes and 1 primary key.

The below code works but it's a lot slower than I would like. See timing below:

-- 1,000 records
In [35]: %time connection_factory.executemany(sql, args)
CPU times: user 30.2 ms, sys: 40.9 ms, total: 71.1 ms
Wall time: 3.54 s

-- 5,000 records
In [46]: %time connection_factory.executemany(sql, args)
CPU times: user 110 ms, sys: 55.8 ms, total: 166 ms
Wall time: 17 s

I've tried using sql_alchemy and am currently using Turbodbc - but open to anything else that works faster.

Below is a sample of my code

from turbodbc import connect, make_options

class ConnectionFactory:
def __init__(self):
    self.connection = self.initialize()

@staticmethod
def initialize():
    options = make_options(autocommit=True)
    return connect(driver="FREETDS",
                   server="",
                   port="",
                   database="",
                   uid="",
                   pwd="",
                   turbodbc_options=options)

def execute(self, query, params=None):
    try:
        cursor = self.connection.cursor()
        cursor.execute(query, params)
    except Exception as e:
        print(e)
    finally:
        cursor.close()
        return

def executemany(self, query, params=None):
    try:
        cursor = self.connection.cursor()
        cursor.executemany(query, params)
    except Exception as e:
        print(e)
    finally:
        cursor.close()
        return


sql = """
INSERT INTO table1 (value1,
                    value2,
                    value3,
                    value4,
                    value5,
                    value6,
                    value7)
VALUES (?, ?, ?, ?, ?, ?, ?); """

args = df.to_records().tolist()
connection_factory = ConnectionFactory()
connection_factory.executemany(sql, args)

Is anyone familiar with this exact combination of SQL Server and python that could point me in the right direction?

I'm sorry, but where is the data being loaded from? A record-file? — Matthew E. Miller, Aug 28 '19 at 00:13
from a pandas dataframe in memory. I showed the conversion in the bottom of the code, but it's being hidden by the scrolling window. — fcol, Aug 28 '19 at 15:41

score 0 · Answer 1 · answered Aug 28 '19 at 00:22

Sorry, my mistake, I posted information about mySQL. You're looking for msSQL.

Here is an equivalent bulk insert statement for msSQL:

BULK INSERT MyTable
    FROM 'path\myfile.csv'
    WITH 
        (FIELDTERMINATOR = ';',
         ROWTERMINATOR = '\n')

There are a few options:

You may write your data to a .csv file and then leverage mySql's very fast LOAD DATA INFILE command.

OR

You may also use another form of the insert command, which is:

INSERT INTO tbl_name
    (a,b,c)
VALUES
    (1,2,3),
    (4,5,6),
    (7,8,9);

See these optimization links:

score 0 · Answer 2 · answered Aug 28 '19 at 00:49

I can see that you already have function for execute(). it should be faster same to bulk insert.

args= ', '.join(map(str, df.to_records().tolist()))

sql = "
INSERT INTO table1 (value1,
                    value2,
                    value3,
                    value4,
                    value5,
                    value6,
                    value7)
VALUES {}".format(args)

connection_factory = ConnectionFactory()
connection_factory.execute(sql)

Create new method to execute query from string without params.

def execute2(self, query):
    try:
        cursor = self.connection.cursor()
        cursor.execute(query)
    except Exception as e:
        print(e)
    finally:
        cursor.close()
        return

ahhh.. very creative. Let me try it and see how it compares. I'll report back findings. Thank you. — fcol, Aug 28 '19 at 15:43

How to fast insert data from Python 3 to SQL Server?

2 Answers2