4

I have two tables in an MS Access 2010 database: TBLIndividuals and TblIndividualsUpdates. They have a lot of the same data, but the primary key may not be the same for a given person's record in both tables. So I'm doing a join between the two tables on names and birthdates to see which records correspond. I'm using a left join so that I also get rows for the people who are in TblIndividualsUpdates but not in TBLIndividuals. That way I know which records need to be added to TBLIndividuals to get it up to date.

SELECT TblIndividuals.PersonID AS OldID, 
TblIndividualsUpdates.PersonID AS UpdateID
FROM TblIndividualsUpdates LEFT JOIN TblIndividuals 
ON ( (TblIndividuals.FirstName = TblIndividualsUpdates.FirstName) 
and (TblIndividuals.LastName = TblIndividualsUpdates.LastName) 
AND (TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn 
    or (TblIndividuals.DateBorn is null 
        and (TblIndividuals.MidName is null and TblIndividualsUpdates.MidName is null 
            or TblIndividuals.MidName = TblIndividualsUpdates.MidName))));

TblIndividualsUpdates has 4149 rows, but the query returns only 4103 rows. There are about 50 new records in TblIndividualsUpdates, but only 4 rows in the query result where OldID is null.

If I export the data from Access to PostgreSQL and run the same query there, I get all 4149 rows.

Is this a bug in Access? Is there a difference between Access's left join semantics and PostgreSQL's? Is my database corrupted (Compact and Repair doesn't help)?

andybalholm
  • 15,395
  • 3
  • 37
  • 41

4 Answers4

4
ON ( 

        TblIndividuals.FirstName = TblIndividualsUpdates.FirstName

        and 

        TblIndividuals.LastName = TblIndividualsUpdates.LastName

        AND (
                 TblIndividuals.DateBorn = TblIndividualsUpdates.DateBorn      
                 or 
                 (
                     TblIndividuals.DateBorn is null          
                     and 
                     (
                     TblIndividuals.MidName is null 
                     and TblIndividualsUpdates.MidName is null              
                     or TblIndividuals.MidName = TblIndividualsUpdates.MidName
                     )
                 )
             )
    );

What I would do is systematically remove all the join conditions except the first two until you find the records drop off. Then you will know where your problem is.

HLGEM
  • 94,695
  • 15
  • 113
  • 186
  • If I remove conditions I get false matches, so I get more rows than in the original table, but some rows are still omitted. Really, no matter what my join condition is, I should _never_ get fewer rows than the left table has, as ypercube pointed out in his answer. – andybalholm Apr 26 '12 at 00:03
3

This should never happen. Unless rows are being inserted/deleted in the meantime,

the query:

SELECT *
FROM a LEFT JOIN b
         ON whatever ;

should never return less rows than:

SELECT *
FROM a ;

If it happens, it's a bug. Are you sure the queries are exactly like this (and you have't omitted some detail, like a WHERE clause)? Are you sure that the first returns 4149 rows and the second one 4103 rows? You could make another check by changing the * above to COUNT(*).

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Check out his join criteria. He has `WHERE` criteria in his `JOIN` conditions. – JNK Apr 24 '12 at 17:56
  • @JNK: I see `WHERE` nowhere in the query. – ypercubeᵀᴹ Apr 24 '12 at 17:57
  • @JNK: Thnx. I'm now waitign for the OP to tell us that you were right (because he mistakenly did not include the WHERE clause in the question) or that the `TblIndividuals` has the 4149 rows (and the Updates is only 4103) or that Access frontend is applying a filter or that (for some other peculiar reason) how the issue is not at the Left Join. I somehow doubt that Access has a bug :) – ypercubeᵀᴹ Apr 24 '12 at 18:17
  • @JNK: Well, if it is actually a bug, it could well be in the query execution planner code that is rewriting Joins to Wheres and vice versa. Since this kind of complex conditions are not used often in `ON` clauses, it wouldn't be too strange that it hadn't been found. – ypercubeᵀᴹ Apr 24 '12 at 18:27
  • What I posted is my exact query (except for whitespace changes). There is no WHERE clause. At the bottom of the window it says "No Filter". – andybalholm Apr 25 '12 at 23:46
  • As far as the possibility of an undiscovered error: I think it has been discovered by other users too, and they finally gave up trying to track it down. [This earlier question](http://stackoverflow.com/questions/3524447/access-left-join-not-working-properly) looks like the same problem to me, but there was never a satisfactory answer, so the OP gave up and added columns to his table and used an UPDATE query to fill them in instead of using a join. – andybalholm Apr 25 '12 at 23:49
1

Drop any indexes from both tables which include those JOIN fields (FirstName, LastName, and DateBorn). Then see whether you get the expected 4,149 rows with this simplified query.

SELECT
    i.PersonID AS OldID, 
    u.PersonID AS UpdateID
FROM
    TblIndividualsUpdates AS u
    LEFT JOIN TblIndividuals AS i
    ON
        (
            (i.FirstName = u.FirstName) 
        AND (i.LastName = u.LastName) 
        AND (i.DateBorn = u.DateBorn)
        );
HansUp
  • 95,961
  • 11
  • 77
  • 135
  • The query as you gave it gives 4149 rows, both before and after dropping indexes. But when I replace `(i.DateBorn = u.DateBorn)` with `(i.DateBorn = u.DateBorn or i.DateBorn is null)` it gives only 4136 rows. So it seems to be the check for null that triggers the bug. – andybalholm Apr 26 '12 at 18:11
  • Not everyone has a birthdate in the database. If the birthdate is null in both the old table and the update table, `i.DateBorn = u.DateBorn` will be null and the records will not be joined. `or i.DateBorn is null` catches that possibility. – andybalholm Apr 30 '12 at 22:22
0

For whatever it is worth, since this seems to be a deceitful bug and any additional information could help resolving it, I have had the same problem.

The query is too big to post here and I don't have the time to reduce it now to something suitable, but I can report what I found. In the below, all joins are left joins.

I was gradually refining and changing my query. It had a derived table in it (D). And the whole thing was made into a derived table (T) and then joined to a last table (L). In any case, at one point in its development, no field in T that originated in D participated in the join to L. It was then the problem occurred, the total number of rows mysteriously became less than the main table, which should be impossible. As soon as I again let a field from D participate (via T) in the join to L, the number increased to normal again.

It was as if the join condition to D was moved to a WHERE clause when no field in it was participating (via T) in the join to L. But I don't really know what the explanation is.

Magnus
  • 1,584
  • 19
  • 14