how to prevent duplicates with inner join query (Postgres)

Question

I am trying to understand how to create a query to filter out some results based on an inner join.

Consider the following data:

formulation_batch
-----
id  project_id  name    
1   1           F1.1
2   1           F1.2
3   1           F1.3
4   1           F1.all

formulation_batch_component
-----
id  formulation_batch_id    component_id
1   1                       1
2   2                       2
3   3                       3
4   4                       1
5   4                       2
6   4                       3
7   4                       4

I would like to select all formulation_batch records with a project_id of 1, and has a formulation_batch_component with a component_id of 1 or 2. So I run the following query:

SELECT formulation_batch.* 
FROM formulation_batch 
INNER JOIN formulation_batch_component
ON formulation_batch.id = formulation_batch_component.formulation_batch_id
WHERE formulation_batch.project_id = 1 
    AND ((formulation_batch_component.component_id = 2 
        OR formulation_batch_component.component_id = 1 ))

However, this returns a duplicate entry:

1;"F1.1"
2;"F1.2"
4;"F1.all"
4;"F1.all"

Is there a way to modify this query so that I only get back the unique formulation_batch records which match the criteria?

EG:

1;"F1.1"
2;"F1.2"
4;"F1.all"

Thanks for your time!

Clodoaldo Neto · Answer 1 · 2013-07-31T00:45:07.767

27

In this case it is possible to apply the distinct before the join possibly making it more performant:

select fb.* 
from
    formulation_batch fb
    inner join
    (
        select distinct formulationbatch_id
        from formulation_batch_component
        where component_id in (1, 2)
    ) fbc on fb.id = fbc.formulationbatch_id 
where fb.project_id = 1

Notice how to use alias for the table names to make the query clearer. Also then in operator is very handy. The use of double quotes with those identifiers is not necessary.

edited Jul 31 '13 at 00:45

answered Jul 31 '13 at 00:15

Clodoaldo Neto

118,695
26
233
260

Thank you for the idea on the optimization. The query is actually being generated by an ORM, but I think I have a way to implement it like this instead (in much the same way I modified it to select distinct on the outer query). – drkstr101 Jul 31 '13 at 00:30
In my tests, innerjoining like above was also faster than using the formulationbatch_ids in a where clause, ie: `select fb.* from formulation_batch fb where fb.project_id = 1 AND fb.id IN ( select formulationbatch_id from formulation_batch_component where component_id in (1, 2) )` – Kristian Sandström Oct 06 '17 at 09:13

score 15 · Accepted Answer · answered Jul 30 '13 at 23:53

15

One way would be to use distinct:

SELECT distinct "formulation_batch".* 
FROM "formulation_batch" 
INNER JOIN "formulation_batch_component" 
ON "formulation_batch"."id" = "formulation_batch_component"."formulationBatch_id" 
WHERE "formulation_batch"."project_id" = 1 
    AND (("formulation_batch_component"."component_id" = 2 
        OR "formulation_batch_component"."component_id" = 1 ))

answered Jul 30 '13 at 23:53

Gordon Linoff

1,242,037
58
646
786

Thanks, I forgot about distinct. Its not the greatest performance wise, but still better than filtering it out at the software level. – drkstr101 Jul 31 '13 at 00:32

score 2 · Answer 3 · answered Jul 31 '13 at 23:53

2

I know the question asks how to prevent duplicates with inner join but could use an IN clause in the predicate.

SELECT "formulation_batch".* 
FROM "formulation_batch" fb
ON "formulation_batch"."id" = "formulation_batch_component"."formulationBatch_id" 
WHERE "formulation_batch"."project_id" = 1 
 AND fb.id IN (SELECT "formulation_batch"."id"
               FROM formulation_batch_component
               WHERE (("formulation_batch_component"."component_id" = 2 
                      OR "formulation_batch_component"."component_id" = 1 ))

answered Jul 31 '13 at 23:53

user2640118

31
1

This is the best solution to the question I was trying to ask. Cheers! – drkstr101 Dec 30 '16 at 08:55
2

I tested both this and @Clodoaldo Neto's answer, inner joining was ~50% faster in my case – Kristian Sandström Oct 06 '17 at 09:18

how to prevent duplicates with inner join query (Postgres)

3 Answers3

Linked