Constraint with Query

Question

There are two tables

City (Name, Country_code, Population)
Country (Name, Code, Population)

The task is: The sum of population of all cities in a country, should be less or equal to population of a country. -

Create a constraint and an assertion
Create a trigger using constraint and assertion. Or propose your own way of trigger syntax

I tried to create a constraint on table country, but get an error because of query

ALTER TABLE country 
  ADD CONSTRAINT check_pop_sum 
  CHECK (population <= ANY(SELECT SUM(POPULATION) 
                             FROM CITY 
                            GROUP BY COUNTRY_CODE));

You can't embed a select statement in a constraint. – OldProgrammer Dec 15 '14 at 20:13 — OldProgrammer, Dec 15 '14 at 20:13

Aramillo · Answer 1 · 2014-12-15T20:44:31.073

1

You can do this using a trigger. Check this:

CREATE TRIGGER check_population
  BEFORE INSERT
   ON CITY
    FOR EACH ROW 

DECLARE
   POPULATION_AMOUNT_CITY NUMBER;
   POPULATION_AMOUNT_COUNTRY NUMBER;

BEGIN
   SELECT SUM(POPULATION) INTO POPULATION_AMOUNT_CITY FROM CITY WHERE CODE = :NEW.Country_code;
   SELECT Population INTO POPULATION_AMOUNT_COUNTRY FROM COUNTRY WHERE CODE = :NEW.Country_code;

   IF (POPULATION_AMOUNT_CITY + :NEW.POPULATION) > POPULATION_AMOUNT_COUNTRY THEN
       RAISE_APPLICATION_ERROR(-20000, 'Population exceeded');
   END IF;

END;

edited Dec 15 '14 at 20:44

answered Dec 15 '14 at 20:36

Aramillo

3,176
3
24
49

1

This is probably over punctilious, but I would add an update trigger too. Both on the `City` table (increase of city population beyond `COUNTRY.POPULATION`) and on the `country` table (lower countries population below `SUM(CITY.POPULATION)`). And a foreign constraint too, just to prevent dropping a country when there are still some cities... – Sylvain Leroux Dec 15 '14 at 21:39
1

A simple trigger like this will fail to detect a constraint violation if there are concurrent sessions inserting/updating the table. – Jeffrey Kemp Dec 16 '14 at 06:33
Yes, @JeffreyKemp, you are right when say `will fail to detect a constraint violation if there are concurrent sessions inserting/updating`, but how can i solve this problem? – Aramillo Dec 16 '14 at 12:47
You need to introduce some form of serialization (e.g. a database lock) so that if more than one session tries to insert/update a country, or any city in that country, the session must first take a lock on the country before it attempts any insert/update. That way, other sessions will wait until the first session issues a commit or rollback before they try their transaction. – Jeffrey Kemp Dec 16 '14 at 15:02

score -1 · Answer 2 · answered Dec 16 '14 at 18:34

The situation you describe is not a legitimate data integrity issue. so the solution is not a constraint, no matter how it is implemented. Data integrity is concerned primarily with the validity of the data, not accuracy. Accuracy is not a concern of the database at all.

Data integrity can fit into two categories: context-free integrity and contextual integrity. Context-free integrity is when you can verify the validity of the datum without referring to any other data. If you try to write an integer value into a date field for example (domain checking) or set an integer field to "3" instead of 3 or set it to 3 when the range is defined as "between 100 and 2000".

Contextual integrity is when the validity must be considered as part of a group. Any foreign key, for example. The value 3 may be perfectly valid in and of itself, but can fail validity checking if the proper row in a different table doesn't exist.

Accuracy, on the other hand, is completely different. Let's look again at the integer field constrained to a range of between 100 and 2000. Suppose the value 599 is written to that field. Is it valid? Yes, it meets all applicable constraints. Is it accurate? There is no way to tell. Accuracy of the data, as the data itself, originates from outside the database.

But doesn't the ability to add all city's population within a county and compare it to the overall county population mean that we can check for accuracy here?

Not really, or not in a significant way. Upon inserting a new city or updating a city population value, we can test to see if the total of all city populations exceeds the county population. We can alert the user to a problem but we can't tell the user where the problem is. Is the error in the insert or update? Is there a too-large population value in an existing city that was entered earlier? Are there several such too-large values for many cities? Are all city population values correct but the country population too small?

From within the database, there is no way to tell. The best we can to is verify the incorrect total and warn the user. But we can't say "The population of city XYZ is too large" because we can't tell if that is the problem. The best we can do is warn that the total of all cities within the county exceed the population defined for the county as a whole. It is up to the data owners to determine where the problem actually occurs.

This may seem like a small difference but a constraint determines that the data is invalid and doesn't allow the operation to continue ("Data Integrity: preventing bogus data from entering the database in the first place").

In the case of a city population, the data is not invalid. We can't even be sure if it is wrong, it could well be absolutely correct. There is no reason to prevent the operation from completing and the data entering the database.

It is nice if there can be some ability to verify accuracy but notice that this is not even such a case. As city data is entered into the database, the population value for most of them could be wildly erroneous. You aren't aware of a problem until the county population is finally exceeded. Some check is better than none, I suppose, but it only alerts if the inaccuracies result in a too-large value. There could just as well be inaccuracies that result in too small a value. So some sort of accuracy check must be implemented from the get-go that will test for any inaccuracies -- too large or too small.

That is what will maintain the accuracy of the data, not some misplaced operation within the database.

"not a legitimate data integrity issue"? "validity, not accuracy" Really? The OP has given a well-defined integrity constraint. Just because the current DBMS's don't have native support for this type of constraint doesn't mean it's not a valid integrity constraint. — Jeffrey Kemp, Dec 18 '14 at 00:26
It's a valid accuracy check. The reason current DBMSs don't have native support is because accuracy checking can't be done from within the database. It must be performed externally. — TommCatt, Dec 18 '14 at 05:45
You do understand the closed-world assumption, don't you? If the database knows the population of a country, it can know the upper bound for the population of the cities in that country. That's a valid constraint. — Jeffrey Kemp, Dec 18 '14 at 07:25
It is a logical constraint or limitation that may be checked from time to time. It is not anything that can fit into an `alter table t add constraint c ...` form. Sure, it is possible to implement something using triggers and/or udfs, but what do you do if you catch it...raise an error and rollback the transaction? But the problem may not be with the data row that triggers the error. That data may be correct, it could be previous entries where the problem happened. It is impossible to prevent inaccurate data from being entered. Catching that and fixing it is an app-level operation. — TommCatt, Dec 18 '14 at 20:52
Besides, this is not an example of a closed-world assumption. First off, that assumption applies to the perceived accuracy of a query -- that is, that all the data needed is already in the database. Second, the very fact that new cities are being entered shows they don't have a closed-world situation. It is very open. Third, open or closed, the assumption is that the data is complete, not that it is absolutely accurate. We've been discussing accuracy, not completeness. — TommCatt, Dec 18 '14 at 20:59
(1) just because the current syntax/implementation doesn't support multi-table constraints of arbitrary complexity doesn't mean they are not a valid constraint. And with constraints, we are only concerned with *consistency*, not *accuracy*. (2) the closed-world assumption *always* applies - without it, we cannot logically implement any constraints at all. — Jeffrey Kemp, Dec 19 '14 at 00:05

Constraint with Query

2 Answers2