3

I am writing a postgresql view that uses some complex logic to generate rows of data. It requires that I perform the same set of steps on two sets of data derived from results of querying two different tables. Is it possible to write a function that generalizes these steps?

For example:

CREATE TABLE foos (
  id bigserial primary key,
  name text not null
);

INSERT INTO foos (name) VALUES ('FOO_ONE');
INSERT INTO foos (name) VALUES ('FOO_TWO');
INSERT INTO foos (name) VALUES ('FOO_THREE');
INSERT INTO foos (name) VALUES ('FOO_FOUR');

CREATE TABLE bars (
  id bigserial primary key,
  name text not null
);

INSERT INTO bars (name) VALUES ('BAR_ONE');
INSERT INTO bars (name) VALUES ('BAR_TWO');
INSERT INTO bars (name) VALUES ('BAR_THREE');
INSERT INTO bars (name) VALUES ('BAR_FOUR');

Now I must compute:

WITH even_foos AS (SELECT * FROM foos WHERE id % 2 = 0)
SELECT id * 2 AS double_id, name FROM even_foos;

and

WITH odd_bars AS (SELECT * FROM foos WHERE id % 2 = 1)
SELECT id * 2 AS double_id, name FROM odd_bars;

Note how the id * 2 select is repeated in both queries. I would like to reuse this functionality so that I only have to implement this logic once. For example (made up syntax):

WITH even_foos AS (SELECT * FROM foos WHERE id % 2 = 0)
SELECT * FROM get_double_id_and_name(even_foos);

and

WITH odd_bars AS (SELECT * FROM foos WHERE id % 2 = 1)
SELECT * FROM get_double_id_and_name(odd_bars);

Is there an easy way to accomplish this?

Tripp Kinetics
  • 5,178
  • 2
  • 23
  • 37
  • If the referred table (or column) names are not constant(s) youl'll **need** dynamic SQL. BTW: excellent question, excellently phrased. – wildplasser Jun 12 '14 at 19:00

1 Answers1

1

Dynamic SQL would certainly do the trick, since you could wrap the SQL in a loop where the looping is done over a list containing (0, 1), and then interpolate that dynamically into the query to get the values for 0 and 1.

If you want to keep it pure SQL, you could turn the query into a function, like so:

CREATE FUNCTION get_foos(int) RETURNS TABLE (id bigint, name text)
    AS $$ SELECT * FROM foos WHERE id % 2 = $1 $$
    LANGUAGE SQL;

And then call it like this:

SELECT get_foos(1);
SELECT get_foos(0);

For the specific example you've provided, it's not clear that the overhead of the function is worth it, since the values you'll be passing to it in this case is really just a set of 2. For cases where the set of potential argument values is much larger than that, factoring out common code into a function as I've done above is definitely a big win. (And given that you mentioned a view that uses some complex logic to generate rows of data, it sounds like it may well end up fitting the latter case where a function is worth it.)

And certainly, even in this case, putting it into a function is fine, especially if you would prefer to centralize that logic. That centralization also has the benefit of encapsulating the logic inside the DB, so you can call it via SQL from, say, a Python program, and a Ruby program, etc. without repeating the logic in the actual driver code in each case.

Here's an SQL fiddle with the function defined and it being called: http://sqlfiddle.com/#!15/fb079/3

khampson
  • 14,700
  • 4
  • 41
  • 43
  • Thanks for the response and sorry for the late reply. In my example, it's the id*2 logic, not the WHERE logic that was the angry beast. I should have made that more clear... In the end, I wasn't able to achieve my goal of performing the same query against two separate parameterized tables (one is actually dependent on the other in my case, which makes it somewhat harder). I didn't want to go the dynamic SQL route, so in the end I just ended up copying-and-pasting the select. Not ideal, but hopefully they won't need too much separate maintenance. – Physics Newbie Jun 26 '14 at 18:27