-1

I have this python function which inserts to a SQL database. The script is such that every time it is rerun it will have to insert the same row over again in addition to new rows. Eventually I will be changing this so that it only inserts new rows but for now I have to work with some sort of update statement.

I'm aware that I can use MERGE in SQL Server to achieve something similar to MySQL's ON DUPLICATE KEY UPDATE, but I'm not exactly sure how it should be used. Any advice is welcome. Thanks!

def sqlInsrt(headers, values):
    #create string input of mylisth
    strheaders = ','.join(str(i) for i in headers)

    #create string ? param's for INSERT clause
    placestr = ','.join(i for i in ["?" for i in headers])

    #create string ? param's for UPDATE clause
    replacestr = ', '.join(['{}=?'.format(h) for h in headers])

    #Setup and execute SQL query 
    insert = ("INSERT INTO "+part+" ("+strheaders+") VALUES ("+placestr+")")
    cursor.execute(insert, values)
    cnx.commit()
Jonny1998
  • 107
  • 1
  • 1
  • 13
  • `placestr = ",".join("?" for i in headers)`. – chepner Jul 27 '18 at 16:32
  • What *is* the statement being passed to `execute`? – chepner Jul 27 '18 at 16:55
  • With my edit, the statement is just an INSERT. 'part' variable is not defined in the function but it controls which table the data will be inserted into. 'strheaders' and 'placestr' are the headers and param placeholders for the values to be inserted – Jonny1998 Jul 27 '18 at 16:59
  • This is a vulnerability waiting to happen. You're including the column names directly in the SQL. Someone is going to come along and pass in user input as column names directly to this function. – jpmc26 Jul 27 '18 at 17:37
  • Well, I'm writing this as an intern for a company, and it will only be used on an internal server and the information that the whole application deals with is public information. All of it is available on some website that I'm scrubbing data from. – Jonny1998 Jul 27 '18 at 17:42

2 Answers2

0

You should read the docs for Merge. Basically MERGE INTO TargetTable USING SourceTable ON TargetTable.id = SourceTable.id .... Then you can read the docs about using When not marched by Target etc. So your Python would maybe swap out the table names and joins using params

DanB
  • 59
  • 3
0

I wrote a script that solves the simplest case of merging two identically structured tables, one containing new/updated data. This is useful in incremental data imports. You can expand it depending on your needs (eg. if you need a type 2 SCD):

def create_merge_query(
    stg_schema: str,
    stg_table: str,
    schema: str,
    table: str,
    primary_key: str,
    con: pyodbc.Connection,
) -> str:
    """
    Create a merge query for the simplest possible upsert scenario:
    - updating and inserting all fields
    - merging on a single column, which has the same name in both tables

    Args:
        stg_schema (str): The schema where the staging table is located.
        stg_table (str): The table with new/updated data.
        schema (str): The schema where the table is located.
        table (str): The table to merge into.
        primary_key (str): The column on which to merge.
    """

    columns_query = f"""
    SELECT 
        col.name
    FROM sys.tables AS tab
        INNER JOIN sys.columns AS col
            ON tab.object_id = col.object_id
    WHERE tab.name = '{table}'
    AND schema_name(tab.schema_id) = '{schema}'
    ORDER BY column_id;
    """
    columns_query_result = con.execute(columns_query)
    columns = [tup[0] for tup in columns_query_result]
    columns_stg_fqn = [f"stg.{col}" for col in columns]
    update_pairs = [f"existing.{col} = stg.{col}" for col in columns]
    merge_query = f"""
    MERGE INTO {schema}.{table} existing
        USING {stg_schema}.{stg_table} stg
        ON stg.{primary_key} = existing.{primary_key}
        WHEN MATCHED
            THEN UPDATE SET {", ".join(update_pairs)}
        WHEN NOT MATCHED
            THEN INSERT({", ".join(columns)})
            VALUES({", ".join(columns_stg_fqn)});
    """
    return merge_query
Michał Zawadzki
  • 695
  • 6
  • 14