0

I'm experiencing some very bizarre behaviors with a left outer join query.

Edit: I've provided an Access database with the tables and data needed to reproduce this problem. Note that this is the best way to reproduce the problem, because if the data size is reduced to a small sample, then the problem doesn't appear. This can be downloaded at: https://joek.com/etc/left_join_odd_behavior.accdb

Edit: My core problem with this is that I've been using left outer joins in this manner for years without experiencing a problem like this before. Additionally, I probably have hundreds of other left outer joins like this across a dozen applications, and haven't experienced a problem like this, or worse, if this problem is occurring in some of those other queries but haven't been noticed yet.

Edit: The left outer join is providing different results based on factors that should not affect the results (example, inserting different number of columns into another table, or including a WHERE clause for a specific set of results, or reducing the data to a small sample). My underlying problem isn't with fixing this query (I can do that a few different ways that give the correct results). My problem is understanding why this query that should work doesn't work in certain cases so that I can understand if it's: a data problem (so that I can fix it in these tables and check that it isn't happening in any other tables), an indexing problem (so that I can apply the correct indexes to these tables and make sure other tables are correct), or an issue inherent in MS Access (so that I can code around it, change all my other similar left outer joins, and use different methods going forward [as I've had to do with some other differences between Access & SQL Server that I've accustomed myself to since having to start working with Access over 6 years ago]).

I've simplified it to the following query:

SELECT
*x*
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)

I have indexes on all these columns being used. Edit: (Note that in my sample database, I've renamed the columns from the production code to match my example query here.)

If x is:

TableA.IntA INTO TempTableX

then the results I get are missing some records which do match the left join criteria.

If x is:

TableA.* INTO TempTableX

then I get all the records expected.

If I simply return all the records without inserting them into a table, then it doesn't matter if I return TableA.IntA or TableA.* or even TableA.IntA, TableA.IntB; every which way excludes some matching records.

Each table has thousands of records (TableA 50K+, TableB 8,000+), but for the records that match on both IntA and IntB, there are only about 200 matches. And of those, only 17 have DateA set to Null.

Here's an example of data which is not being returned correctly is the following:

TableA
-----------
IntA   IntB
1      10
2      22
3      33
4      44

TableB
-------
IntA   IntB   DateA
2      20     1/1/2020
3      31     2/1/2020
4      44     3/1/2020

As you can see, this:

SELECT
TableA.IntA, TableA.IntB
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)

should return the following (edit: because this is a left outer join with specific criteria in the ON, and with no WHERE clause to constrain the results, it should return ALL the records from TableA):

TableA.IntA   TableA.IntB
-------------------------
1             10
2             22
3             33
4             44

but instead I get the following (edit: it is excluding the one record that matches the left join criteria despite no WHERE clause specifying to exclude records in TableA that do not match to TableB):

TableA.IntA   TableA.IntB
-------------------------
1             10
2             22
3             33

But, if I insert TableA.* into a table, then I get the expected 4 records (edit: note that this occurs when using a SELECT TableA.* INTO TempTableX, which causes Access to create the new table based on the data structure of the columns from TableA, and I'm not returning any columns from TableB, so there would not be any problem of it not populating data because of table field requirements).

Additionally, if I add a WHERE criteria to the query to specifically get the expected records, then I get them in the result. Or, if I empty TableA and TableB of all other records except these example records, then I get them correctly.

Edit:


Using the provided database, you can test these situations out in the following manner:

First:

SELECT
TableA.*
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)

you'll get 58,160 records, and when you take these results to a spreadsheet and filter on the CUSTID column for "5616", you'll find 25 records (this is an incomplete result).

Second:

SELECT
TableA.*
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)
WHERE
TableA.CustId = "5616"

you'll find now the correct 26 records.

If you compare these results, you'll find that the one record which gets excluded in the incomplete results (IntA = 25093 & IntB = 59797) is the only record in TableB for CustId 5616 that matches both IntA and IntB. But while it doesn't match the criteria of DateA IS NULL, it should still be returned from TableA even in the first query because there is no WHERE clause excluding TableA records that do not match the left join criteria.

Alternately, if you change the first query to:

SELECT
TableA.*
INTO TempTableX
FROM TableA
LEFT OUTER JOIN TableB
ON (TableB.IntA = TableA.IntA
AND TableB.IntB = TableA.IntB
AND TableB.DateA IS NULL)

then you'll get all 58,348 records that exist in TableA, and filtering for CUSTID "5616" you'll find all 26 records.

Alternately, if you make no changes to the first query, but empty TableA of all records except where CustId = "5616", then you'll get the complete 26 records instead of 25.

So again, my problem is why is Access providing different results based on changes which should not typically affect the results (number of records in the database, returning data vs inserting into a table, etc)?

Additionally for more detail on why I'm using a left join like this:

  1. This query is not the end goal, it is simply the smallest reduction of the production code where I found the problem occurring.
  2. The production use is to collect all records from TableA that DO NOT have a matching record in TableB where the DateA is null (aka TableB is a log of the records in TableA that have been reported and DateA is populated when the issue is resolved and thus fall off the report, so all DateA nulls are those still open on the report, and thus the production query includes a WHERE TableB.LogId IS NULL in order to exclude all records from TableA that are still unresolved in TableB [so again for more detail the record for CustId "5616" should return all 26 if I add WHERE TableB.LogId IS NULL because 0 records in TableB for CustId "5616" have a null in DateA and thus 0 should join and thus all 26 records for CustId in TableA will have TableB.LogId = null, but this correct behavior does not occur unless I limit the results with more criteria, insert all columns into a table, or reduced the data in the tables, etc]).
  3. The left join is on two columns IntA and IntB because TableB is a log of occurrences of the records in TableA. In otherwords, IntA might have occurred 5 times in TableB (5 different values of IntB) but I only want to match the specific occurrence of IntB per IntA as recorded in TableA.

And, as I mentioned above, these columns are indexed in the tables, and I've even tried compact & repair of the database.

Any thoughts?

JoeK
  • 11
  • 2
  • What you are seeing is a bug. Outer joining in MS Access looks a lot like gambling to me. Whatever one puts in the `ON` clause of a `LEFT OUTER JOIN` must of course keep all the rows from the left-hand table. But add `AND TableA.IntA = -1` to your `ON` clause and there are no rows returned anymore! My advice is simple: Stay away from MS Access when you want a good, reliable DBMS. – Thorsten Kettner Dec 10 '20 at 20:22
  • @ThorstenKettner Thanks, a bug explains it better and reduces the stress that it's something wrong with the code or the data or something I can control. – JoeK Dec 14 '20 at 13:10
  • Googling 'Outer join MS Access site:stackoverflow.com left table missing rows no where' my 1st hit: https://stackoverflow.com/q/12573561/3404097 – philipxy Jan 03 '21 at 05:54

2 Answers2

0

It isn't an error in the script. The last table is the correct logical expected outcome of the LEFT JOIN you provided.

By your own query:

SELECT
TableA.IntA, TableA.IntB
FROM TableA
LEFT OUTER JOIN TableB
  ON (TableB.IntA = TableA.IntA
  AND TableB.IntB = TableA.IntB
  AND TableB.DateA IS NULL) -- THIS HERE IS YOUR TRICKY BIT

This means you want

  • 1: all the rows from TableA that have no matching IntA and IntB pairs in TableB
  • 2: and all the rows from TableA that do have matching IntA and IntB but have a null DateA. Note that no row in your sample data matches this.

The expected result is:

TableA.IntA   TableA.IntB
-------------------------
1             10
2             22
3             33
-- 4 does not qualify, because it matches IntA and IntB in TableB AND has a non-null DateA value.

If you wanted to see data where IntA and IntB were in both tables, but DateA really was a NULL value, you'd need:

SELECT
TableA.IntA, TableA.IntB
FROM TableA
INNER JOIN TableB
  ON (TableB.IntA = TableA.IntA
  AND TableB.IntB = TableA.IntB)
WHERE TableB.DateA IS NULL

As to why Access has different behavior depending on the columns selected, I got nothing. I loathe Access for a myriad of reasons. Happy to help you nail down queries if you need a specific set of results, but the quirks of Access is for someone else to answer.

CoffeeNeedCoffee
  • 1,554
  • 3
  • 13
-1

I do not know much about access. What I guess is, that due to the left join there is data with null values.

Could it be, that the temp tables are defined without allowing nulls in some columns and the data is then swallowed during the insert silently somehow? This would explain the difference between selecting just some or all columns into the temp table.

If Access allows to not define the temp table you could try to define it anyway and define nullable columns. I could not tell you how to do that in Access.

Dharman
  • 30,962
  • 25
  • 85
  • 135
rominator007
  • 1,553
  • 1
  • 11
  • 22