0

There are two postgres tables with roughly the following structures:

products
+----+-----------+
| id | rawDataId |
+----+-----------+
| 1  |     a     |
+----+-----------+
| 2  |     null  |
+----+-----------+
| 3  |     b     |
+----+-----------+
| 4  |     null  |
+----+-----------+
| 5  |     c     |
+----+-----------+

rawData
+----+-----------+
| id | productId |
+----+-----------+
| a  |     1     |
+----+-----------+
| b  |     3     |
+----+-----------+
| c  |     5     |
+----+-----------+
| d  |     6     |
+----+-----------+
| e  |     7     |
+----+-----------+
| f  |     8     |
+----+-----------+
| g  |     9     |
+----+-----------+

While FK constraints were not properly defined when these were set up, in application logic products.rawDataId maps to rawData.id and rawData.productId maps to products.id. Each record in products can have 1 or 0 records in rawData, meanwhile, there are some orphaned records in rawData whose associated products records have been deleted.

I'm trying to find all orphaned rawData records by doing a left join. However, I get different number of rows depending on my join condition:

psql> SELECT count(*)
       FROM "rawData" "r"
       LEFT JOIN "products" "p"
       ON "r"."productId" = "p"."id"
       WHERE "p"."id" IS NULL;

psql> 7000


psql> SELECT count(*)
       FROM "rawData" "r"
       LEFT JOIN "products" "p"
       ON "r"."id" = "p"."rawDataId"
       WHERE "p"."id" IS NULL;

psql> 9239

These tables have ~100k rows, so it's not easy to traverse them manually, but I was wondering under what scenarios I would be getting different results based on the join condition. Should they not return the same number of rows?

Are there any other queries I can execute to find the diff, or get the number of orphaned rows more accurately, given these table structures and constraints?

The output I would like to get, based on the sample tables is:

+----+-----------+------+-----------+
| id | productId |  id  | rawDataId |
+----+-----------+------+-----------+
| d  |     6     | null |   null    |
+----+-----------+------+-----------+
| e  |     7     | null |   null    |
+----+-----------+------+-----------+
| f  |     8     | null |   null    |
+----+-----------+------+-----------+
| g  |     9     | null |   null    |
+----+-----------+------+-----------+
GothamCityRises
  • 2,072
  • 2
  • 27
  • 43
  • What is the exact output you want to see based on these two sample tables? – Tim Biegeleisen Feb 08 '22 at 02:29
  • @TimBiegeleisen Added the desired output to the question – GothamCityRises Feb 08 '22 at 02:36
  • Not necessarily that exact structure, but I guess I want to remove the records from `rawData` corresponding to the orphaned rows, which are ids d, e, f, and g. – GothamCityRises Feb 08 '22 at 02:37
  • Your first query, returning 7000 records, is already correct. Use that one. – Tim Biegeleisen Feb 08 '22 at 02:41
  • Why is the second query wrong? – GothamCityRises Feb 08 '22 at 02:55
  • Maybe it's not wrong, but the PK-FK appears to be `rawData(productId)` to `products(id)`. If so, then you want to use the first query. – Tim Biegeleisen Feb 08 '22 at 02:56
  • "...I get different number of rows depending on my join condition..." -- Of course you'll get different matches. You described that there are **two relationships** between these tables. Depending on which one you use as a predicate of the outer join, you'll get different results. – The Impaler Feb 08 '22 at 03:17
  • join condition is different for these two query. that's why result change. – Rahul Biswas Feb 08 '22 at 05:23
  • But they're referring to each other. In the example tables, would the two statements return different results? – GothamCityRises Feb 09 '22 at 02:13
  • @TheImpaler I didn't explicitly mention **two relationships** anywhere. I might have not been clear though. There's two ways of defining the same relationship, yes. – GothamCityRises Feb 09 '22 at 02:15
  • @GothamCityRises You say "...in application logic products.rawDataId maps to rawData.id and rawData.productId maps to products.id...". Those are two different relationships according to relational algebra. Joining by one of them produces a different product than joining by the other one. – The Impaler Feb 09 '22 at 02:31

0 Answers0