1

Is there a way to convert these identifiers to a PostgreSQL query using psycopg2:

total_query_fields = (
    'p.id',
    'p.name',
    'p.type',
    'p.price',
    'o.date',        # please, notice the 'o' alias here
    'o.transaction', # please, notice the 'o' alias here
    'p.warehouse',
    'p.location',
)

# they may get split into their own tables if necessary:
product_query_fields = ('id', 'name', 'type', 'price', 'warehouse', 'location',)
order_query_fields = ('date', 'transaction',)

into something like this:

import psycopg2
from psycopg2 import sql

myid = 100

sql_query = sql.SQL("""
    SELECT {fields} FROM product p
    INNER JOIN owner o ON p.id = o.product_id
    WHERE p.id = {jidx} AND (o.dateof_purchase IS NOT NULL
    OR o.state = 'checked_out' );"""
).format(
    fields = # there should be the readable ***magic***
    jidx = sql.Literal(myid)
)

?

Even after understanding my problem by reading this thread, I'm not able to figure out a nice way for getting the list of my fields using a dotted notation. There must probably be two map() used, I guess, and both an sql.SQL('.').join(...) and sql.SQL(', ').join(...).
Or maybe something more elegant, e.g. using SELECT {}.{}...?


Because for the moment I'm having trouble with that:
fields = sql.SQL(', ').join(map(sql.Identifier, total_query_fields)),

because it will escape all the sequences "table.fields" with double quotes, which is definitely not valid in SQL:

# /!\ INVALID SQL QUERY /!\:
print(sql_query.as_string(conn))
# will print:
# SELECT "p.id", "p.name", "p.type", "p.price", "o.date", "o.transaction", "p.warehouse", "p.location" FROM product p
#    INNER JOIN owner o ON p.id = o.product_id
#    WHERE p.id = 100 AND (o.dateof_purchase IS NOT NULL
#    OR o.state = 'checked_out' );

And indeed, if I copy/paste the last query directly in my favorite PostgreSQL query tool:

ERROR:  column "p.id" does not exist
LINE 1: SELECT "p.id", "p.name", "p.type", "p.price", "o.date",...
               ^
HINT:  Perhaps you meant to reference the column "p.id" or the column "o.id".
SQL state: 42703
Character: 8

The same error is obviously raised by psycopg2:

UndefinedColumn: column "p.id" does not exist
LINE 1: SELECT "p.id", "p.name", "p.type", "p.price", "o.date...
               ^
HINT:  Perhaps you meant to reference the column "p.id" or the column "o.id".

The documentation is also clear on that: Extract of the sql.Identifier documentation

Versioning:

psycopg2.__version__
 '2.9.1 (dt dec pq3 ext lo64)'
swiss_knight
  • 5,787
  • 8
  • 50
  • 92

2 Answers2

1

If I understand the question correctly, I think the trick here is that sql.Identifier accepts one or more strings (*strings), so you can split the columns on the dot and pass both parts to sql.Identifier which will compose the desired "alias"."column" result.

>>> i = sql.Identifier('a', 'col')
>>> i.strings
('a', 'col')
>>> conn = psycopg2.connect(database='test')
>>> cur = conn.cursor()
>>> i.as_string(cur)
'"a"."col"'

Quoting all the fields can be done like this:

fields = sql.SQL(', ').join(sql.Identifier(*f.split('.')) for f in total_query_fields)

sql_query = sql.SQL(
    """
    SELECT {fields} FROM product p
    INNER JOIN owner o ON p.id = o.product_id
    WHERE p.id = {jidx} AND (o.dateof_purchase IS NOT NULL
    OR o.state = 'checked_out' );"""
).format(fields=fields, jidx=sql.Literal(myid))

The resulting query (from cursor.mogrify) is

b'\n    SELECT "p"."id", "p"."name", "p"."type", "p"."price", "o"."date", "o"."transaction", "p"."warehouse", "p"."location" FROM product p\n    INNER JOIN owner o ON p.id = o.product_id\n    WHERE p.id = 100 AND (o.dateof_purchase IS NOT NULL\n    OR o.state = \'checked_out\' );'

If you prefer to use map rather than a generator expression you could use itertools.starmap

from itertools import starmap

fields = sql.SQL(', ').join(
    starmap(sql.Identifier, map(lambda f: f.split('.'), total_query_fields)))
snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • 1
    I always figure out the most complicated way! Thanks for this simple working solution! Anyway, I'll let my answer nearby for those who want to dig more into what's happening under the hood ;-) – swiss_knight Sep 19 '21 at 13:58
0

I ended up finding one way (complicated but working):

list(
    map(sql.SQL('.').join,
    zip(
        map(sql.Identifier, len(product_query_fields)*'p'),
        map(sql.Identifier, product_query_fields))
    )
)

which will print:

[Composed([Identifier('p'), SQL('.'), Identifier('id')]),
 Composed([Identifier('p'), SQL('.'), Identifier('name')]),
 Composed([Identifier('p'), SQL('.'), Identifier('type')]),
 Composed([Identifier('p'), SQL('.'), Identifier('price')]),
 Composed([Identifier('p'), SQL('.'), Identifier('warehouse')]),
 Composed([Identifier('p'), SQL('.'), Identifier('location')])]

So, the following will construct the full identifiers objects using the dot notation, notice the usage of the zip() built-in:

product_table_name_or_alias = 'p'
products_composed = sql.SQL(', ').join(
    map(sql.SQL('.').join,
        zip(
            map(sql.Identifier, len(product_query_fields)*product_table_name_or_alias), 
            map(sql.Identifier, product_query_fields)
        )
    )
)

products_composed
#Composed([
#    Composed([Identifier('p'), SQL('.'), Identifier('id')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('name')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('type')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('price')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('warehouse')]), SQL(','),
#    Composed([Identifier('p'), SQL('.'), Identifier('location')])
#])

Same thing for the joined table:

order_table_name_or_alias = 'o'
orders_composed = sql.SQL(', ').join(
    map(sql.SQL('.').join,
        zip(
            map(sql.Identifier, len(order_query_fields)*order_table_name_or_alias),
            map(sql.Identifier, order_query_fields)
        )
    )
)

order_composed
#Composed([
#    Composed([Identifier('o'), SQL('.'), Identifier('date')]), SQL(', '),
#    Composed([Identifier('o'), SQL('.'), Identifier('transaction')])
])

You cannot magically add these two objects:

total_composed = products_composed + orders_composed

total_composed
#Composed([
#    Composed([Identifier('p'), SQL('.'), Identifier('id')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('name')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('type')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('price')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('warehouse')])
#    Composed([Identifier('o'), SQL('.'), Identifier('date')]), SQL(', '),
#    Composed([Identifier('o'), SQL('.'), Identifier('transaction')])
#])

because it will basically miss a SQL(', ') between the two lists as you can see. So this will collapse theses two fields in the SELECT query as "p"."warehouse""o"."date" which is obviously not valid and will throw an error: missing FROM-clause entry for table "warehouse""o"

Hence, it was also tempting to try to append them together but...

products_composed.append(order_composed)
# AttributeError: 'Composed' object has no attribute 'append'

So, back to basics and pass the resulting composed object to your query

# Add the missing sql.SQL(', ') using a simple addition in between the 2 objects
total_composed = products_composed + sql.SQL(', ') + orders_composed
total_composed
#Composed([
#    Composed([Identifier('p'), SQL('.'), Identifier('id')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('name')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('type')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('price')]), SQL(', '),
#    Composed([Identifier('p'), SQL('.'), Identifier('warehouse')]), SQL(', '), # here it is!!!
#    Composed([Identifier('o'), SQL('.'), Identifier('date')]), SQL(', '),
#    Composed([Identifier('o'), SQL('.'), Identifier('transaction')])
#])


sql_query = sql.SQL("""
    SELECT {composed_fields} FROM product p
    INNER JOIN owner o ON p.id = o.product_id
    WHERE p.id = {jidx} AND (o.dateof_purchase IS NOT NULL
    OR o.state = 'checked_out' );"""
).format(
    composed_fields = total_composed # there should be the readable ***magic***
    jidx = sql.Literal(myid)
)

conn = psycopg2.connect(**DB_PARAMS)
sql_query.as_string(conn)
SELECT "p"."id", "p"."name", "p"."type", "p"."price",
       "p"."warehouse", "o"."date", "o"."transaction"
     FROM product p
    INNER JOIN owner o ON p.id = o.product_id
    WHERE p.id = 100
      AND (o.dateof_purchase IS NOT NULL
       OR o.state = 'checked_out' ); # This finally works!

This is nice because it deals with table having capitals or strange character in their name! (but it's definitely better to stay with lower case letters in postgres!).

Drawback that I didn't fix yet: will be a mess to deal with if you need a specific column ordering in your query, because the composed object doesn't behave like a list (which I was expecting)...

swiss_knight
  • 5,787
  • 8
  • 50
  • 92