2

The Python code I'm writing makes a psycopg2 connection to a PostgreSQL database. I need to build some reports out of the data in this database, so I have a couple Python procs that periodically run and create a csv file out of some tables and nice queries.

The problem I'm facing here is that I need to include a column in my csv report which is the result of a function stored in the PostgreSQL database. This database is managed by another group of persons so I can't write to it. I could easily see the content of that function and emulate the behaviour on Python and have the column values I need to calculate, but in this case this function is periodically changing and it won't make sense to continually update the Python function.

So my question here is if it's possible to somehow load the database function into Python code, every time my code connects to the database. I could make actual use of the function on the database itself, but imagine making 900K calls to a database function from Python code to calculate a value, simply does not scale.

edit: Adding sql function

CREATE OR REPLACE FUNCTION public.p_start(integer, integer)
 RETURNS numeric
 LANGUAGE sql
 IMMUTABLE
AS $function$
SELECT CASE WHEN $1 = 0 AND $2 = 0 THEN 0.2760
            WHEN $1 = 0 AND $2 = 1 THEN 0.0684
            WHEN $1 = 0 AND $2 = 2 THEN 0.0277
            WHEN $1 = 0 AND $2 = 3 THEN 0.0189
            WHEN $1 = 0 AND $2 = 4 THEN 0.0038
            WHEN $1 = 0 AND $2 = 5 THEN 0.0098
            WHEN $1 = 1 AND $2 = 1 THEN 0.5501
            WHEN $1 = 1 AND $2 = 2 THEN 0.2264
            WHEN $1 = 1 AND $2 = 3 THEN 0.1203
            WHEN $1 = 1 AND $2 = 4 THEN 0.0804
            WHEN $1 = 1 AND $2 = 5 THEN 0.0839
            ELSE 0.1 END;
$function$

Thanks

PepperoniPizza
  • 8,842
  • 9
  • 58
  • 100
  • So you want to translate the query into a Python function? or do you want to execute the query from Python? – zom-pro Aug 18 '15 at 21:07
  • @zom-pro Any of those is valid, as long as I avoid making the round trip to the database and get a result that I can directly use on Python and write to the csv. – PepperoniPizza Aug 18 '15 at 21:10
  • How large is the query? I ask because you would lose the SQL optimisation if translated into Python. This is perhaps the most powerful reason to use a database, unless the query small in which case it won't matter. – zom-pro Aug 18 '15 at 21:13
  • Look here: http://stackoverflow.com/questions/19472922/reading-external-sql-script-in-python – zom-pro Aug 18 '15 at 21:26
  • Is the database function written in PL/PgSQL? PL/Python? Other? Does it do database access its self? – Craig Ringer Aug 18 '15 at 22:52
  • @CraigRinger: added the function to the post – PepperoniPizza Aug 18 '15 at 23:01
  • Right, so it's a plain sql-language function. Unless you can rely on its structure remaining stable enough that you can write a simple parser, fetching its source from the `prosrc` column of `pg_proc`, I think you're out of luck. You should get the database team to store this mapping in a small table instead of a function, then you can just query the table contents and cache them. They can still have a function that returns the result of a select from the table, and it'll get inlined efficiently in most cases. – Craig Ringer Aug 18 '15 at 23:08

1 Answers1

1

It's a plain sql-language function.

Unless you can rely on its structure remaining stable enough that you can write a simple parser for the function body, fetching its source from the prosrc column of pg_proc, I think you're out of luck. I wouldn't want to do this, as it'd be extremely fragile.

You should ask the database team to store this mapping in a small table instead of a function, then you can just query the table contents and cache them. They can still have a function that returns the result of a select from the table, and it'll get inlined efficiently in most cases, and it won't affect compatibility with existing apps that use the function. E.g. if the info was in a table probabilities the function would be:

CREATE OR REPLACE FUNCTION public.p_start(integer, integer)
 RETURNS numeric
 LANGUAGE sql
 IMMUTABLE
AS $function$
SELECT coalesce(
  SELECT probability FROM probabilities WHERE a = $1 and b = $2
0.1)
$function$

Failing that, if the range of possible input values is known in advance, you can generate a map of the full function domain and cache it in your app with something like:

test=> SELECT startval, endval, p 
       FROM generate_series(0,1) startval 
       cross join generate_series(0,5) endval 
       cross join p_start(startval, endval) p;

 startval | endval |   p    
----------+--------+--------
        0 |      0 | 0.2760
        1 |      0 |    0.1
        0 |      1 | 0.0684
        1 |      1 | 0.5501
        0 |      2 | 0.0277
        1 |      2 | 0.2264
        0 |      3 | 0.0189
        1 |      3 | 0.1203
        0 |      4 | 0.0038
        1 |      4 | 0.0804
        0 |      5 | 0.0098
        1 |      5 | 0.0839
(12 rows)

which is trivial to turn into a Python dictionary for lookups locally.

This only works if the function's domain is finite and known.


BTW, it's wrong to define it as IMMUTABLE then re-define it, unless they DROP and re-CREATE it to ensure nothing relies on the old definition. It should be declared STABLE. Re-defining an IMMUTABLE function in a way that can produce different values will result in incorrect query results if the function is used in any expression indexes.

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778