1

I'm trying to load data that is the result of a Python 2.7 library - pg8000 query against a AWS Redshift table.

First I query a Redshift table for results and it provides them inside a tuple containing values in a list.

Example:

value_tuple = ([datetime.date(2016, 10, 2), 1, 1, 123123], [datetime.date(2016, 10, 2), 1, 2, 3131312])

I'd like to insert these values into another table but am running into a few problems creating the insert statement. I'm open to inserting all the values individually, but I'd rather do them all at once.

The table I'm working with looks like this:

| date_column | value_1 | value_2 | value_count | 
| 2016-10-01  |    1    |    2    |     300     |

First I tried to create an update query that used just a single list inside the tuple

update_query = """insert into my_schema.my_table_to_update
                       values %s
                       """ % str(value_tuple[0])

This way the update_query becomes:

'insert into my_schema.my_table_to_update\n                       values [datetime.date(2016, 10, 2), 1, 1, 123123]\n                       '

When I run the update query against the table I get this error:

pg8000.core.ProgrammingError: (u'ERROR', u'42601', u'syntax error at or near "["', u'70', u'/home/rdsdb/padb/src/pg/src/backend/parser/parser_scan.l', u'699', u'yyerror', u'', u'')

I think this means I can't use the list format of the results to insert into the table.

What is a functional, and preferably efficient, way of inserting this data into Redshift?

Fernando
  • 429
  • 6
  • 16

1 Answers1

2

I'm not familiar with pg8000 so take this with a grain of salt.

The final query you want to run should end up looking like:

INSERT INTO your_table (date_column, value_1, value_2, value_count) 
VALUES ('2016-10-02', 1, 2, 3131312);

You should avoid string interpolation with SQL queries as it can open you up to an injection attack.

Your library should support it's own SQL safe parameterization (which also takes care of quoting for you). If it follows the Python DB API standard it should be one of these: https://www.python.org/dev/peps/pep-0249/#paramstyle.

Using what I'm familiar with (psycopg2 - http://initd.org/psycopg/docs/usage.html#passing-parameters-to-sql-queries) the code would look like:

update_query = """insert into my_schema.my_table_to_update 
        (date_column, value_1, value_2, value_count)
        VALUES (%s, %s, %s, %s);"""
cur.execute(update_query, value_tuple)

Sounds like in your case you could make it even more efficient by doing it all in SQL. If your initial selection query yields the 4 desired fields which can be aliased with the correct column names (for example: AS value_count), then you could do a query with a structure like this:

insert into my_schema.my_table_to_update (
    select date_column, value_1, value_2, value_count
    from (your selection query here)
);
systemjack
  • 2,815
  • 17
  • 26
  • After executing the query using curr.execute(query), if you want to persist the changes to db then commit changes using conn.commit() – suthar_arun Feb 04 '23 at 06:38