0

I'm working with two sets of data that I am trying to combine via a Join command (not a Union).

I don't think I understand the basics of joining. I used a Full Outer Join as follows:

    Select
      Table1.col1,
      Table1.col2, 
      Table1.date1,
      Table2.col1,
      Table2.col2,
      Table2.date2
   From Table1 full outer join 
        Table2 On Table1.date1 = Table2.date2

The final combined dataset had a total number of rows greater than the sum of rows in Table1 and Table2.

I'm trying to understand why this would happen.

I was under the impression that (# rows in CombinedTable) = (# rows in Table1) + (# rows in Table2).

Why does this occur? How can I fix this?

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
K.Doe
  • 39
  • 1
  • 9
  • 1
    There is no FULL JOIN in mysql are you using sqlserver? – P.Salmon Feb 23 '20 at 07:29
  • 2
    If you're really using MySQL, then none of this occurred, because MySQL does not support full outer join. What is your real database? – Tim Biegeleisen Feb 23 '20 at 07:32
  • 2
    What you are seeing is not limited to full outer joins. _All_ joins can result in a greater number of rows, because a given row in one table can always match to mutiple rows in the other table. – Tim Biegeleisen Feb 23 '20 at 07:40
  • Every indication here is that you _do_ want to use `UNION`. The columns match and you htink you're going to get X+Y rows. Why don't you want to use a UNION? – Nick.Mc Feb 23 '20 at 10:37

3 Answers3

1

Check this out:

TblJob
Name, Surname, Job
John, White, Developer
John, Black, Tester
John, Grey, Manager

TblDrinksPref
Name, Surname, Drink
John, White, Coffee
John, Black, Tea
John, Grey, Orange Juice

SELECT * FROM tbljob j JOIN tbldrinkspref p ON j.name = p.name

John, White, Developer, John, White, Coffee
John, White, Developer, John, Black, Tea
John, White, Developer, John, Grey, Orange Juice
John, Black, Tester, John, White, Coffee
John, Black, Tester, John, Black, Tea
John, Black, Tester, John, Grey, Orange Juice
John, Grey, Manager, John, White, Coffee
John, Grey, Manager, John, Black, Tea
John, Grey, Manager, John, Grey, Orange Juice

By joining in just first name every row in each table matches the other. 3 rows in each table result in 3x3 tables output; more than the sum of the rows. The most rows you'll get out of a join is the multiplication of the numbers of rows heading into the join. We call this a Cartesian product and it's usually an indication that there is a bug in your SQL joins. Any join can do this, not just outer ones. There exists a join (called a CROSS JOIN) whose sole purpose is to produce an output that is a perfect Cartesian product because sometimes we do want to do it but mostly it indicates a problem

What can you do about it? Don't join rows to rows that are unrelated, by making your join conditions better/more accurate:

SELECT * 
FROM tbljob j JOIN tbldrinkspref p 
ON j.name = p.name 
  --the last name is vital to associate rows correctly in this case
  AND j.surname = p.surname

If you have written a large SQL and are getting certain rows unexpectedly duplicated it means one of your joins is faulty. Comment them all out back to just the first table, and comment the select block, then keep rerunning the sql as you add joins back in. When you see the rowcount increase unexpectedly that's possibly the fail but be aware that a join may cause rows to disappear too, and you might be in a situation where adding a join may cause half the rows to disappear because they don't match the join predicate but the other half of the rows to double because the join is faulty. You must keep in mind the data you're joining when assessing how the row count should change as a result of adding a table in, versus how it does actually change

Caius Jard
  • 72,509
  • 5
  • 49
  • 80
1

You will get N number of examples and diagram, you should have some understanding on joins before looking at those examples and diagrams, I assume you're using MS Sql.

Full outer join returns a result set that includes rows from both left and right table,so if you have a 3 rows in first table and 5 rows in second table, it need not be only 8 rows. It also depends on how the foreign key value is used between those two tables.

if the value from second table is not mapped with the first table column values then those return as null.

as mentioned by @Caius Jard, the return values increases based on the value mapped. Hope it helps a bit for you.

PS Full join and Full Outer join are same!

Manjuboyz
  • 6,978
  • 3
  • 21
  • 43
0

Consider two tables A with m rows and B with n rows and a query like this:

select count(*)
from a full join
     b
     on <some condition>;

This row can return (almost) any number between greatest(n, m) and n * m.

It would return greatest(n, m) if the condition were always 1-to-1 (such as on ids).

It would return n + m if the condition always evaluated to FALSE.

It would return n * m if the condition always evaluated to TRUE.

It can return almost any number in-between, with a few exceptions (for instance, in many cases, it would be hard to get n * m - 1 rows).

By contrast, an INNER JOIN could return between 0 and n * m rows.

On the other hand, UNION ALL always returns exactly the sum of the rows in the two tables, so you might be confusing UNION ALL and FULL JOIN.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786