0

I'm using a python workflow that makes frequent SQL calls, with the SQL calls stored in standalone files and formatted using psycopg2 identifiers. I'm trying to modify these calls using dictionary objects, but can't get the SQL calls to format correctly.

An example header of an SQL file would be:

with my_table as (
  SELECT DISTINCT ON ("column_A")
    left_table."column_b",
    left_table."column_c",
    {column_identifier}
    ...

In my Python script I call the following:

col_info = my_dictionary.entry
column_identifier = " ".join([f'left_table."{x}",' for x in col_info])

This looks exactly right, but if I include this in a call

sql_call = py2.sql.SQL(sql).format(
  column_identifier=py2.sql.Identifier(column_identifier),
  ...
  )

I receive a formatting error where the parsed columns all look like this:

"left_table.""column_d"", "

If I use Literal instead of Identifier then the SQL call transacts but produces a nonsensical output. I cannot use AsIs because other Identifier calls are used.

The only potential solution I have seen so far is where the entire SQL call is contained within f""" ... """ but since there are multiple, long SQL calls, including them all in the Python script would be very unwieldy. Being able to format these simple lines would be a highly preferable solution.

Thanks

Nick Crouch
  • 29
  • 1
  • 4
  • Take a look at how they do it in : https://stackoverflow.com/questions/46770201/compose-dynamic-sql-string-with-psycopg2 – JonSG Mar 21 '23 at 14:56
  • Why wouldn't they look like `"left_table.""column_d""` you did `'left_table."{x}",'` which double quoted `x` and then `py2.sql.Identifier()` which double quoted it again. Look at docs here [sql](https://www.psycopg.org/docs/sql.html), in particular *class psycopg2.sql.Identifier ... Multiple strings can be passed to the object to represent a qualified name, i.e. a dot-separated sequence of identifiers.* . There are plenty of examples to show how to do this. – Adrian Klaver Mar 21 '23 at 18:12
  • It's still unclear to me how to prevent the double quotes around the column names. For example, escaping isn't an option (`.\"{x}\"`) – Nick Crouch Mar 22 '23 at 15:06
  • Don't do this `column_identifier = " ".join([f'left_table."{x}",' for x in col_info])`. Look at the examples in the `sql` module link above and add the table/column names directly to the `sql.SQL().format()`. The table/column names will end up double quoted because that is what `sql.Identifier()` does. The result if done correctly will be `"column_d"` not `""column_d""`. – Adrian Klaver Mar 22 '23 at 16:53
  • I've done that elsewhere, but here I'm looking for a solution where I can generate an Identifier where the number of columns called changes between scripts, which is why `left_table` is part of the call – Nick Crouch Mar 22 '23 at 19:46

1 Answers1

0

A rough draft of what I think you are looking for:

import psycopg2
from psycopg2 import sql
con = psycopg2.connect("dbname=test host=localhost  user=aklaver")

left_table = 'left_tbl'
table_name = 'some_tbl'
column_list = ['a', 'b', 'c']


q = sql.SQL("SELECT {} FROM {} JOIN {} ...").format(sql.SQL(', ').join(
            [sql.Identifier(left_table, col)  for col in column_list]),
            sql.Identifier(left_table), sql.Identifier(table_name))
print(q.as_string(con))

SELECT "left_tbl"."a", "left_tbl"."b", "left_tbl"."c" FROM "left_tbl" JOIN "some_tbl" ...

Adrian Klaver
  • 15,886
  • 2
  • 17
  • 28