Using Postgres domains to simplify function input validation

Question

Using Postgres 11.5, I've been looking at CREATE DOMAIN since yesterday, and would like to clarify how they can/can't help with function parameters. Ideally, I'd like to use a domain to screen parameter inputs easily, but with a helpful error response. As an example, I'm using a simple first-case, a domain that blocks null and empty strings:

CREATE DOMAIN text_not_empty AS
    text
    NOT NULL
    CHECK (value <> '');

I tried this out as a field type for a table, and it's great. When we don't allow empty strings, this seems like a simple way to implement a constraint without an individual rule or a trigger. I'm hoping to get similar benefits on function parameters. However with functions, we really need clear error messages as the caller may be from Node, or some other environment.

As an example, here's a dummy function that requires a string, using an explicit check on the input:

CREATE OR REPLACE FUNCTION api.test_this_function(in_text text)
 RETURNS int
 LANGUAGE plpgsql
AS $function$

BEGIN

IF in_text = '' THEN
    RAISE EXCEPTION USING
        message = 'Input text must be supplied',
        detail  = 'Deets',
        hint    = 'Supply a search string',
        errcode = 'KC123'; -- Custom code
END IF;

    RETURN 1;

END;
$function$

This works fine, but I'm hoping a domain could simplify such cases. As far as I can tell, a domain won't improve things as the function never gets a chance to grab the error. If I've read the docs and understood my experiments correctly, you can only RAISE within a BEGIN...END block. If so, how do people recommend vetting inputs? And am I missing an opportunity with domains?

To flesh out what I'm basing my impressions on, here's a function that does use a domain-based checked, as well as a (silly) custom check:

CREATE OR REPLACE FUNCTION api.domain_test(in_text text_not_empty)
     RETURNS timestamptz

AS $BODY$

BEGIN

IF in_text = 'foo' THEN
    RAISE EXCEPTION USING
        message = 'Invalid search string',
        hint = 'Supply a search string other than ''foo''.',
        errcode = 'KC123'; -- Custom code
END IF;

    RETURN now();

END;

$BODY$
 LANGUAGE plpgsql;

So, it should fail on no parameter, a null parameter, an empty string, or a string of 'foo'. Otherwise, it should return a timestamp. I tried out these five cases, shown here:

select * from domain_test();      -- 1 : Fails to reach the method.
select * from domain_test(null);  -- 2 : Fails before entering method.
select * from domain_test('');    -- 3 : Fails before entering method.
select * from domain_test('foo'); -- 4 : Fails on custom exception.
select * from domain_test('a');   -- 5 : Succeeds.

I've diagrammed out how far each of these statements make it through the function. The diagram is no clearer than the code, but sometimes I find it helpful to try and make a diagram, just to see if I've got all of the pieces.

I'm not asking a specific code question, but it would be a big help if someone could confirm that my model of how the errors are being caught and handled is correct and complete. Once I've understood how Postgres "thinks" about this stuff, it will be easier for me to reason about it too.

The null and empty string cases never get to the BEGIN block, so there doesn't seem to be a way to use RAISE to customize the message, hint, etc. Is there a global error handler, or a broader catch system that I'm overlooking?

Regarding ORMs

wildplasser offered some comments about ORMs and strategies, which make it clear I didn't explain the background here. I didn't want to bog the question down with more detail, but I figure I'll add in some clarification.

We're not going with an ORM. It seems like that adds an other model/abstraction layer to help people used to some other language. For me, it's just more complexity I don't gain anything from, in this case. I'd prefer to write the queries in straight SQL without a lot of scaffolding. The idea is to push the query logic/SQL into Postgres. Then there's one place for the logic.

The plan is to make our PG query API a series of functions with defined inputs/outputs. I can capture or store those details that up using pg_proc, information_schema.parameters, and a custom table to define parameter rules (allowed/excluded values, series, or ranges.) That much scaffolding should help as it's pretty easy to mechanize. With the input/output data, I can automatically generate input/output declarations, check code (what I'm working on here), documentation, and test cases. The actual query body? I'll write that by hand. Writing a smart query builder that figures out all of my joins etc.? Postgres is better at that now that I'll ever be...huge task, I'd do a crap job. So, I can hand-write the query body, give it to the PG planner/optimizer, and tweak as needed. It's in a black box, so outside clients aren't harmed by internal modifications.

The HTTP API layer will be written in at two languages to start with, with possibly more languages, an likely more dialects to follow. Then the Node, etc. tools can handle the routing and function calls in their own idiom. The last thing I want to do is push out the query logic implementation to redundant implementations in different languages. Nightmare on so many levels. For the record, the functions will mostly RETURN TABLE, defined inline or via CREATE TYPE.

Why do you thing you need a function? The whole point about domains is that they are a kind of derived/restricted types. See my answer here: https://stackoverflow.com/questions/5484629/check-if-sudoku-solution-is-valid/10541467#10541467 — wildplasser, Feb 03 '20 at 23:42
For your date/timestamp type you could derive it from a standard timestamp and add some restrictions, like `(check value >=1901-01-01 and value < 2100-0-01')` No strings involved, that is handled by the standard serialisiation machinery. — wildplasser, Feb 03 '20 at 23:49
For columns, I don't want a function, you're 100% right. A domain seems pretty miraculous there...so easy. But what I'm also working on is a query API where the SQL for the queries are in functions on the PG side. Then Node, PHP, etc. clients can call the function, like `Get_UserActivity (user_name)`. In that case, I want to make sure that the `user_name` parameter is not `null` or empty. — Morris de Oryx, Feb 03 '20 at 23:51
It looks like I didn't make it clear, but my example above is entirely artificial...I'm trying to work out the mechanics of input validation and parameter typing/checking overall. I'm building up to writing a code generator to help out with building hundreds of queries that will be packaged into an API. So I'm trying to get the basic method template/pattern sorted out. — Morris de Oryx, Feb 03 '20 at 23:52
ORM's only complicate things. Once you use them you are stuck in the mud. I *think* your ORM should treat the date-like domain just like a normal date. Invalid values would only be forbidden. — wildplasser, Feb 03 '20 at 23:53
It looks like you try to build an application-catalog. Thats a hard goal, since it has to be kept syncronised to th DBMS catalogs. The functions+views model could be feasable, if your API is not to large. (and your data model is simple). But: it almost looks like you are inventing your own ORM... — wildplasser, Feb 04 '20 at 00:22
Agreed, keeping the extended attributes in sync is a bit of a pain for a few reasons. Such as, the `pr_proc.oid` will change on `DROP` and then `CREATE` method, so the `schema + function_name + argument types` signature will be necessary. As far as knowing when functions are "dirty", I'm considering using `event_trigger`. But the hard parts aside, the goal is to avoid replicating the SQL in the various client languages. For consistency and security reasons. I'm only in the design and experiment phase, so if there's a better solution than my sketch, I'm all ears! — Morris de Oryx, Feb 04 '20 at 00:50

score 2 · Answer 1 · answered Feb 03 '20 at 23:35

I'd say you got that right.

If the supplied value fails the tests for the domain, the function is not even called. That's a feature: it centralizes such tests in the domain definition, so that you don't have to repeat them all over the place, and it saves the expense of actually calling the function.

I find at least the second error message pretty helpful:

ERROR:  value for domain text_not_empty violates check constraint "text_not_empty_check"

ERROR:  domain text_not_empty does not allow null values

If that is not clear enough for you, and you don't mind writing C, you could write your own data type and have fancy error messages in the type input function.

*It's a feature*, good point. I'll have to decide if the error results are good enough, or if I want to tweak them a bit. I see that `CREATE DOMAIN` supports naming the constraint, so I could tweak that in cases where the default name isn't clear enough. And, no C for me. I'm not a C coder and we're on RDS. — Morris de Oryx, Feb 04 '20 at 00:11

score 0 · Accepted Answer · answered Feb 03 '20 at 23:33

0

Your insights are accurate, the error

select domain_test('');

ERROR:  value for domain text_not_empty violates check constraint "text_not_empty_check"

is raised on the stage of resolving the function argument types hence the function is never executed. If your aim is to customize the error message, the custom domain does not help you.

There is no global error handler. The only option is to call the function inside another code block

do $$
begin
    select domain_test('');
exception when others then
    raise exception 'my error message';
end $$;

ERROR:  my error message
CONTEXT:  PL/pgSQL function inline_code_block line 5 at RAISE

It seems though that your original approach without a custom domain makes more sense.

answered Feb 03 '20 at 23:33

klin

112,967
15
204
232

Thanks for the confirmation, and the point about using a function in front to trap and tweak exceptions. I was thinking of doing that for running my tests cases. As in, generate a bunch of test calls that get passed to a generalized `test_function(_statement text)`, run through there, and then reported out to a table where the expected results are already stored. Something like `EXECUTE format(_statement);` I got it working a bit, at least in a proof-of-concept trial. I'll likely ask about that another day. – Morris de Oryx Feb 04 '20 at 00:15
Note that trapping exceptions in Postgres is expensive and should be avoided whenever possible. – klin Feb 04 '20 at 00:47
I saw a mention of this point in the manuals, but didn't know how to quantify or interpret it. Is there any guidance on what "expensive" means, in relative or absolute terms? Or better, is there a good way for me to test on my own? And does a constraint, like the `domain` provides, improve the situation, versus a custom `exception`? – Morris de Oryx Feb 04 '20 at 00:52
See [example.](https://dbfiddle.uk/?rdbms=postgres_12&fiddle=47704da9b616fd34d79f7457b2ffc70f) – klin Feb 04 '20 at 01:03
Wow, thanks for the example...I'll dive into some experiments. – Morris de Oryx Feb 04 '20 at 04:27
The fiddle helped a lot. I've run some comparisons and posted a few results on another question: https://stackoverflow.com/questions/60066659/postgres-stored-function-input-checking-overhead-interpreting-timing-results – Morris de Oryx Feb 04 '20 at 22:48

Using Postgres domains to simplify function input validation

Regarding ORMs

2 Answers2

Linked