7

What is it wrong about this simple SQL outer join?

select count(*) from A -- 25766
select count(*) from B -- 1242
select count(*) from A left outer join B on A.b = B.b -- 310176

return 25766, 1242 and 310176 rows respectively. (This is for Microsoft SQL Server 2012.) How can A left outer join B ever return more rows than exist in A, especially given this Venn diagram? I guess I'm making a stupid mistake but what is it?

Salman A
  • 262,204
  • 82
  • 430
  • 521
Drux
  • 11,992
  • 13
  • 66
  • 116
  • 2
    Can it be that the column b is not the whole key? – Jens Oct 28 '14 at 12:04
  • +1 I'll check. How could this explain 310176? – Drux Oct 28 '14 at 12:04
  • 2
    If b is only the part of the key it can be that one line in A reference a lot more lines in B and if in your example 1 record of A reference 12,0... rows in B you get your result. – Jens Oct 28 '14 at 12:07
  • @Jens I guess it's the same argument as in hvd's answer, right? – Drux Oct 28 '14 at 12:12
  • Yes it is the same argument. – Jens Oct 28 '14 at 12:14
  • 2
    Possible duplicate of [How can a LEFT OUTER JOIN return more records than exist in the left table?](http://stackoverflow.com/questions/916414/how-can-a-left-outer-join-return-more-records-than-exist-in-the-left-table) – Jeff Mar 16 '16 at 19:32

2 Answers2

16

This can happen when column b is not unique in table B. Suppose you have this data:

  A      B
+---+  +---+---+
| b |  | b | c |
+---+  +---+---+
| 1 |  | 2 | 1 |
| 2 |  | 2 | 2 |
+---+  +---+---+

When you left-join from A to B on column b, you get

+-----+------+------+
| A.b | B.b  | B.c  |
+-----+------+------+
|   1 | NULL | NULL |
|   2 |    2 |    1 |
|   2 |    2 |    2 |
+-----+------+------+

which gives three rows in total, even though both A and B only have two rows each.

3

There is nothing strange about it (and this situation applies to inner joins too). A left outer join:

  • Returns all rows from A cross B where the join condition match
  • And returns all rows from A where the join condition does not match

So at minimum the query will return 25766 rows but there could be more. It is possible to have one row in table A that matches many rows in table B. Example:

A             B             Result
+----+----+  +-----+----+   +------+-----+-----+------+
|  id|   b|  |   id|   b|   |  a.id|  a.b|  b.b|  b.id|
+----+----+  +-----+----+   +------+-----+-----+------+
|   1|  10|  |  123|  10|   |     1|   10|   10|   123|
+----+----+  |  456|  10|   |     1|   10|   10|   456|
             +-----+----+   +------+-----+-----+------+

This returns two rows even though there is one row in table A.

Salman A
  • 262,204
  • 82
  • 430
  • 521