32

I'm trying to make a query with postgresql. The database contains two relations: "kingdom", which includes some english kings, and "dinasty", which contains some people from the stuart dinasty

The relation "kingdom" includes the king name and when his kingdom started and ended. The relation "dinasty" includes name, gender, birth and death.

What I'm trying to query is the king that was the oldest one when he died.

With my query I'm receiving this error at LINE 3 (NOT IN): subquery has too many columns

This is the query:

SELECT kingdom.king, dinasty.birth, dinasty.death
FROM kingdom, dinasty
WHERE kingdom.king = dinasty.name AND kingdom.king NOT IN
(
    SELECT DISTINCT R1.king, R1.birth, R1.death
    FROM
    (
        SELECT DISTINCT R1.king, D1.birth, D1.death
        FROM kingdom AS R1, dinasty AS D1, dinasty AS D2
        WHERE R1.king=D1.name
    ) AS R1, 
    (
        SELECT DISTINCT R1.king, D1.birth, D1.death
        FROM kingdom AS R1, dinasty AS D1, dinasty AS D2
        WHERE R1.king=D1.name
    ) AS R2
    WHERE R1.death-R1.birth < R2.death-R2.birth
);

What is inside the NOT IN is correct.

Oben Sonne
  • 9,893
  • 2
  • 40
  • 61
Epi
  • 682
  • 3
  • 10
  • 16

2 Answers2

36

You are projecting three columns in your subquery, but comparing a single one of them in the IN clause. Select just the required column (r1.king) for the IN in the subquery:

SELECT kingdom.king, dinasty.birth, dinasty.death
FROM kingdom, dinasty
WHERE kingdom.king = dinasty.name AND kingdom.king NOT IN
(
    SELECT DISTINCT R1.king
    FROM
    (
        SELECT DISTINCT R1.king, D1.birth, D1.death
        FROM kingdom AS R1, dinasty AS D1, dinasty AS D2
        WHERE R1.king=D1.name
    ) AS R1, 
    (
        SELECT DISTINCT R1.king, D1.birth, D1.death
        FROM kingdom AS R1, dinasty AS D1, dinasty AS D2
        WHERE R1.king=D1.name
    ) AS R2
    WHERE R1.death-R1.birth < R2.death-R2.birth
);
João Silva
  • 89,303
  • 29
  • 152
  • 158
  • Also, if you can do so, try moving the subquery into the from clause (as a derived table with a suitable field to join on), since it will be considerably faster, evaluating only once, instead of once per row. – d11wtq Sep 02 '12 at 23:41
13

As was answered, your column count didn't match, but there's a much simpler way to write this.

When writing queries, it's best to think them through in stages. First, you need to know how old each king was when they died:

SELECT *, death-birth AS lived_for FROM dinasty

Now that you have that, you can use DISTINCT ON to find the longest-lived king for each kingdom

SELECT DISTINCT ON( name ) name, birth, death, lived_for
  FROM (
      SELECT *, death-birth AS lived_for FROM dinasty
    ) a
  ORDER BY name, lived_for DESC
;

The distinct on will take the first row for each distinct value, so it's important you pair it with the correct ORDER BY. First we order by the name of the dinasty, then how long the king lived for in descending order. That means the first king shown for each dinasty will be the longest-lived one, and that is the record that DISTINCT ON will keep for each dinasty.

Note that I also removed the JOIN to kindgom, but you can add that back in if needed:

SELECT k.*, oldest.*
  FROM (
    SELECT DISTINCT ON( name ) name, birth, death, lived_for
      FROM (
          SELECT *, death-birth AS lived_for FROM dinasty
        ) a
      ORDER BY name, lived_for DESC
    ) oldest
    JOIN kingdom k ON k.king = oldest.name
;

Finally, if you ever need to use multiple columns in a sub-select, you can use the ROW() construct:

SELECT ...
  FROM table_a
  WHERE ROW(f1, f2, f3) NOT IN (SELECT f1a, f2a, f3a FROM ... )
;
Jim Nasby
  • 1,218
  • 10
  • 11