Union query distinct on one column

Question

I would like results from the second query to override results from the first query:

SELECT "panel_restaurants_restaurant"."id",
       "panel_restaurants_restaurant"."name",
       "panel_restaurants_restaurant"."logo",
       "panel_restaurants_restaurantfeatures"."currency" AS "currency",
       ST_DistanceSphere(location, ST_GeomFromText('POINT(0.0 0.0)',4326)) AS "distance",
       "panel_meals_meal"."id" AS "meal_id",
       "panel_meals_meal"."status" AS "meal_status",
       "panel_meals_meal"."available_count" AS "available_dishes",
       "panel_meals_meal"."discount_price" AS "discount_price",
       "panel_meals_meal"."normal_price" AS "normal_price",
       "panel_meals_meal"."collection_from" AS "pickup_from",
       "panel_meals_meal"."collection_to" AS "pickup_to",
       "panel_meals_meal"."description" AS "meal_description"
FROM "panel_restaurants_restaurant"
INNER JOIN "panel_restaurants_restaurantfeatures" ON (
    "panel_restaurants_restaurantfeatures"."restaurant_id" = "panel_restaurants_restaurant"."id")
LEFT OUTER JOIN "panel_meals_meal" ON ("panel_restaurants_restaurant"."id" = "panel_meals_meal"."restaurant_id"
                AND "panel_meals_meal"."status" = 0
                AND (
                ("panel_meals_meal"."collection_from" AT TIME ZONE 'Europe/Warsaw')::date = DATE 'today' OR
                ("panel_meals_meal"."collection_from" AT TIME ZONE 'Europe/Warsaw')::date = DATE 'tomorrow'
                )
                AND "panel_meals_meal"."collection_to" > '2017-07-29 19:33:47.992075+00:00'
                AND "panel_meals_meal"."available_count" > 0)
WHERE "panel_restaurants_restaurant"."status" = 2
UNION
SELECT "panel_restaurants_restaurant"."id",
       "panel_restaurants_restaurant"."name",
       "panel_restaurants_restaurant"."logo",
       "panel_restaurants_restaurantfeatures"."currency" AS "currency",
       ST_DistanceSphere(location, ST_GeomFromText('POINT(0.0 0.0)',4326)) AS "distance",
       "panel_meals_meal"."id" AS "meal_id",
       "panel_meals_meal"."status" AS "meal_status",
       "panel_meals_meal"."initial_count" AS "available_dishes",
       "panel_meals_meal"."discount_price" AS "discount_price",
       "panel_meals_meal"."normal_price" AS "normal_price",
       "panel_meals_meal"."collection_from" AS "pickup_from",
       "panel_meals_meal"."collection_to" AS "pickup_to",
       "panel_meals_meal"."description" AS "meal_description"
FROM "panel_restaurants_restaurant"
INNER JOIN "panel_restaurants_restaurantfeatures" ON (
       "panel_restaurants_restaurantfeatures"."restaurant_id" = "panel_restaurants_restaurant"."id")
LEFT OUTER JOIN "panel_meals_meal" ON (
    "panel_restaurants_restaurant"."id" = "panel_meals_meal"."restaurant_id" AND
    "panel_meals_meal"."status" = 0)
INNER JOIN "panel_meals_mealrepeater" ON (
    "panel_meals_mealrepeater"."meal_id" = "panel_meals_meal"."id")
WHERE "panel_restaurants_restaurant"."status" = 2    AND "panel_meals_mealrepeater"."saturday" = true
ORDER BY distance ASC

For example - the first query may return nulls for what comes from the panel_meals_meal table, but the second one will return something - in that situation I will have the same values for id, name, logo, currency, distance and different values (nulls returned from the first query, and something from the other one) for all the other ones.

So the question is - how do I make this UNION distinct on a certain range of columns (actually only one would suffice - id)?

score 12 · Accepted Answer · answered Jul 29 '17 at 20:05

12

You can do what you want by means of a FULL OUTER JOIN instead of a UNION, and use COALESCE to your advantage.

I simplify your scenario to concentrate on the FULL OUTER JOIN part:

This are the tables (think of them as the result of your first SELECT before whe UNION, and the second SELECT after said UNION):

CREATE TABLE table_a
(
    id INTEGER NOT NULL PRIMARY KEY,
    name TEXT,
    logo TEXT
) ;
CREATE TABLE table_b
(
    id INTEGER NOT NULL PRIMARY KEY,
    name TEXT,
    logo TEXT
) ;

These are the data we have in them:

INSERT INTO 
   table_a
   (id, name, logo)
VALUES
   (1, 'Name1-A', 'Logo1-A'),
   (2, NULL, NULL),
   (3, 'Name3-A', NULL),
   (4, NULL, 'Logo4-A'),
   (5, 'Name5-only-in-A', NULL);

INSERT INTO 
   table_b
   (id, name, logo)
VALUES
   (1, 'Name1-B', 'Logo1-B'),
   (2, 'Name2-B', NULL),
   (3, 'Name3-B', 'Logo3-B'),
   (4, 'Name4-B', 'Logo4-B'),
   (6, 'Name6-only-in-B', 'Logo6-B');

The query you're looking for is done by joining in such a way that you retrieve all the rows from both table_a and table_b. Then, you use:

SELECT
    id, 
    COALESCE(a.name, b.name) AS name,
    COALESCE(a.logo, b.logo) AS logo
FROM
    table_a AS a
    FULL OUTER JOIN table_b AS b USING(id) 
ORDER BY
    id ;

id | name            | logo   
-: | :-------------- | :------
 1 | Name1-A         | Logo1-A
 2 | Name2-B         | null   
 3 | Name3-A         | Logo3-B
 4 | Name4-B         | Logo4-A
 5 | Name5-only-in-A | null   
 6 | Name6-only-in-B | Logo6-B

dbfiddle here

In your case, substitute table_a AS a by your full first (SELECT ...) AS a, and the same for b. I've assumed id are your primary keys.

References:

answered Jul 29 '17 at 20:05

joanolo

6,028
1
29
37

It kind of works, but it uncovered another unexpected behavior. Now, if both selects return data then the result contains duplicates :( – Marek M. Jul 29 '17 at 20:16
No, with this scenario, they don't; that's taken care of by the `JOIN ... USING(id)`. Look at the example, you do not have any duplicates for `id`, even if some of them are in both tables. – joanolo Jul 29 '17 at 20:20
I don't know, maybe that's because you're using two separate tables in your example and I'm selecting from the same, but my result set contains duplicates. – Marek M. Jul 29 '17 at 20:22
In your original query it is expected to have *duplicate id values*, because `UNION` removes duplicates from *whole rows*. This is not the case when you `JOIN`. Take a look at *dbfiddle [here](http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=fd0f1b260acada248c50eb93fbd988ea)* and make sure you understand the difference between the two queries. – joanolo Jul 29 '17 at 20:27
I understand the difference, but my understanding doesn't affect the result set. See for yourself: https://1drv.ms/u/s!AvXY9o8myEVmhc8z8IFOdcUClj5MlQ <- I did as you advised, so I've put my first query in place of the `table_a AS` expressions and did the same for the other one. – Marek M. Jul 29 '17 at 20:38
Result when running the first query: https://1drv.ms/i/s!AvXY9o8myEVmhc80D3Av7i0b7Fnhwg and the result of the second one: https://1drv.ms/i/s!AvXY9o8myEVmhc81xiV99s1WvL38Ow – Marek M. Jul 29 '17 at 20:43
If you get more than one `id`, this means that your original `SELECT` *already* give out more than one `id`. Check just one of them. You have more than one meal per restaurant... decide *which one* you actually want to report (for instant, you can `SELECT min(meal_id) GROUP BY id`). If you want to report more than one, and still do the joins, then `JOIN USING(id, meal_id)` will do the job. – joanolo Jul 29 '17 at 20:43
You wrote your comment faster than I could write mine ;-) – joanolo Jul 29 '17 at 20:44
Maybe these are not `NULLs` but empty strings`''`, which *are not the same thing*. If that were the case, change COALESCE(a.logo, b.logo) to `COALESCE(NULLIF(a.logo, ''), NULLIF(b.logo, '')` and the same for all columns. – joanolo Jul 29 '17 at 20:51
It was my fault (forgot to use `COALESCE`), but there's another issue. Result after `JOIN USING(id, meal_id)` -> https://1drv.ms/i/s!AvXY9o8myEVmhc825KyQsMWpscgNcA <- as you can see the `niemiecka znizka` restaurant appears two times and once without `meal_id`, so that's basically the same situation I had initially :(. Just so you know what's in my topmost select statement: https://1drv.ms/i/s!AvXY9o8myEVmhc83Qsj8gv1zBB2zOQ (it's the same if I don't use `COALESCE` on meal_id column). – Marek M. Jul 29 '17 at 20:57
2

Ufff... Using `ON a.id = b.id AND (a.meal_id = b.meal_id OR a.meal_id IS NULL)` instead of `USING(id, meal_id)` solved the duplicates problem. Since you put me on the correct track, I'm accepting your answer :) – Marek M. Jul 29 '17 at 22:19
UNION is a lot "cheaper" operation than FULL OUTER JOIN – Meir Tseitlin Jul 28 '19 at 17:47

score 3 · Answer 2 · answered Dec 08 '18 at 10:52

Using DISTINCT ON, e.g.

SELECT DISTINCT ON (maintenance_task_id)
  maintenance_task_id,
  execution_count
FROM (
  SELECT
    id maintenance_task_id,
    0 execution_count
  FROM maintenance_task
  UNION
  SELECT
    mte1.maintenance_task_id,
    count(*) execution_count
  FROM maintenance_task_execution mte1
  WHERE
    mte1.ended_at IS NULL
  GROUP BY mte1.maintenance_task_id
) AS t
ORDER BY
  maintenance_task_id,
  execution_count DESC

In this query:

UNION combines results of two queries.
DISTINCT ON picks one row from the top (based on the ORDER BY) for each unique maintenance_task_id value.

score 0 · Answer 3 · answered Jun 29 '18 at 16:04

I did something similar by using a WITH query/CTE:

WITH override_query AS (SELECT * FROM blah_blah JOIN blah_blah [etc]),
     first_query AS (SELECT * FROM blah_blah JOIN blah_bluh [etc]
                    WHERE id NOT IN (SELECT id FROM override_query))
TABLE first_query UNION TABLE override_query

Union query distinct on one column

3 Answers3