0

I am getting extremely poor performance when inserting into sybase temp table - in the order of 10's of seconds for only 1000 rows. It takes 46 seconds for 1000 rows and 10000 rows takes multiple minutes. I will have about 100k rows. Is there any way to speed this insert up? Could it be that temp table in Sybase is configured incorrectly - if so, what do I need to ask my DBA about the configuration?

Code :
    engine = get_connection(db_constants.DB_RISK)
    data = []
    for i in range(1, 1000, 1):
        values = ['AL' + str(i)]
        data.append(values)
    with engine.connect() as conn:
        conn.execute("CREATE TABLE " + tmp_table_name + "(alias_id varchar(255));")
        start = time.time()
        for i in range(0, len(data), 1000):
            loop_start = time.time()
            with conn.begin():
                stmt = "INSERT INTO #TMP (alias_id) VALUES (?)"
                conn.execute(stmt, data[i:i + 1000])
            loop_end = time.time()
            print("batch " + str(i) + " time elapsed : " + str(loop_end - loop_start))
        end = time.time()
        print("total time elapsed : " + str(end - start))
markp-fuso
  • 28,790
  • 4
  • 16
  • 36
Krish Srinivasan
  • 568
  • 1
  • 6
  • 14

2 Answers2

1

Use fast_executemany=True.

pyodbc defaults to fast_executemany=False because not all ODBC drivers support it. However, the SAP ASE ODBC driver does support it, so while this code takes about 11 seconds to execute with a local SAP ASE instance

import os
import time

import sqlalchemy as sa

connection_url = f"sybase+pyodbc://sa:{os.environ['sa_PWD']}@mydb_dsn_sybase"

engine = sa.create_engine(connection_url)

table_name = "#tmp"

rows_to_insert = 10_000
data = [{"alias_id": f"alias_{x}"} for x in range(rows_to_insert)]

with engine.begin() as conn:
    conn.exec_driver_sql(f"CREATE TABLE {table_name} (alias_id varchar(255))")
    t0 = time.perf_counter()
    conn.execute(
        sa.text(f"INSERT INTO {table_name} (alias_id) VALUES (:alias_id)"),
        data,
    )
    print(
        f"{rows_to_insert:,} rows inserted in {(time.perf_counter() - t0):0.1f} seconds"
    )

simply changing the create_engine() call to

engine = sa.create_engine(connection_url, fast_executemany=True)

reduces the execution time to less than a second.

Gord Thompson
  • 116,920
  • 32
  • 215
  • 418
0

Assuming this is Sybase ASE ... and with no idea what, if any, troubleshooting and P&T the OP has performed to date ... some observations, comments and suggestions:

NOTE: Most (all?) of these are going to apply to any program generating a moderate-to-high volume of DB write activity.

Where are the delays?

Have your DBA monitor wait events while you run your process; the wait events should be able to provide details on where the delays are occurring ... during parse/compile? IO waits? dataserver waiting for the client (ie, network and/or client/application delays?)

Statement caching

The optimizer has to parse, and possibly compile, every one of those INSERT statements.

If this is ASE 15+ and each INSERT is being compiled, this can take a long time. In this case it's typically a good idea to make sure the dataserver has been configured to support statement caching (to disable the compilation phase for statements #2 to #N).

Configuring the dataserver for statement caching means 1) allocating some memory to 'statement cache size' and 2) setting 'enable literal autoparam' to 1.

Batching DML statements

Each completed transaction requires a flush of the changed log record(s) to disk before the transaction can be considered 'complete'. The number of writes (of the log) to disk can be reduced by grouping several write commands (eg, INSERTs) into a transaction which will cause the log write(s) to be delayed until a 'commit transaction' has been issued.

While ASE 15+ should have log writes deferred for tempdb activity, it's usually a good practice to group individual DML statements into transactions.

It's not clear (to me) if you're using any sort of transaction management so I'd suggest implementing some transaction management, eg, wrapping the inner loop in a 'begin tran' and 'commit tran' pair.

External output can be slow

Any program that generates output ... either to a console or a file ... will typically see some degradation in performance due to generating said output (more so if the output is going to a file on a 'slow' disk). Even dumping a lot of output to a console can slow things down considerably due to the OS having to constantly redraw the console (shift all lines up by one line, add new line @ bottom, repeat).

If I'm reading your code properly you're generating a print statement after each insert; so we're talking about 100K print statements, yes? That's a lot of IO requests being sent to file or console.

I would want to run some timing tests with and without that print statement (after the INSERT) enabled to see if this could be adding (significantly) to your overall run time.

NOTE: I know, I know, I know ... this sounds a bit silly but ... I've seen some processes sped up by 1x-2x magnitudes simply by limiting/disabling output to a console window. Try running your program without the INSERT and just the print ... how long does it take to scroll 100K lines across a console? how long does it take to print 100K lines to an output/log file?

Bulk inserts

Individual INSERTs are always going to be relatively slow compared to bulk loading capabilities. ASE has a builtin capability for (relatively) fast bulk data loads. At the OS level there is the 'bcp' program. For programming languages (eg, python?) the associated (Sybase/ASE) library should have a bulk insert/copy module.

I'd want to investigate your python/Sybase/ASE lib for some sort of bulk load module and then consider using it to perform the 100K INSERTs.

markp-fuso
  • 28,790
  • 4
  • 16
  • 36
  • This does not answer my question in the least - we are talking 46 seconds for 1000 rows and 3 print statements. And yes, as you can see I am batching, using transactions, executemany. – Krish Srinivasan May 23 '20 at 16:12
  • I don't work with python so I have no idea, from the code presented, if you're using transaction management; perhaps I misread the start/end of the inner loop re: print statement? as for the rest of the answer ... you haven't presented any info to suggest where the delays are occurring ... have you verified that you're using statement cache? have you looked at bulk insert capability? I've added another comment to have your DBA help you with wait events (ie, **where** is the delay taking place?); care to provide any/more details on what troubleshooting/P&T you've performed to date? – markp-fuso May 23 '20 at 16:23
  • ok, in that case can you present numbers that I could expect when using a programmatic interface to insert strings of lenth 25 characters or less into a Sybase local temp table. The python code above is using bulk insert - its meant to be performant which is what is surprising to me. – Krish Srinivasan May 23 '20 at 16:59
  • you can get your own numbers ... populate a file with 100K values, then load into a ASE table via `bcp` using the same batch size you're using in your python script – markp-fuso May 23 '20 at 17:02
  • you asked for numbers of what to expect; a (high-speed) programmatic insertion should be able to mimic the input rates of a bcp process sooo .... you can could get your own numbers by running a bcp test to gauge what the insertion rate should be; as for python vs ASE ... you haven't (yet) provided any info that definitively points to a python or ASE issue; until you review all possible scenarios you're just guessing at where the problem lies; but, hey, you know what you're doing so good luck – markp-fuso May 26 '20 at 12:45