1

I am running following query inside one of my classes which takes ORM classes in so that it can work with several similar tables.

(
            self.db.query(self.orm_contact_class)
            .options(
                load_only(
                    self.orm_contact_class.id,
                    self.orm_contact_class.name,
                    self.orm_contact_class.email_attempts_dict,
                ),
                joinedload(
                    self.orm_contact_class.__dict__[self.access_from_contact_to_company]
                ).load_only(self.orm_company_class.domain)
            )
            .where(
                self.orm_contact_class.email == None,
                self.orm_contact_class.name != None,
                self.orm_company_class.domain != None,
                catch_all_conditions
            )
        )

This results in this monstrous query:

SELECT test.crunchbase_people.id AS test_crunchbase_people_id, 
       test.crunchbase_people.name AS test_crunchbase_people_name, 
       test.crunchbase_people.email_attempts_dict AS test_crunchbase_people_email_attempts_dict, 
       crunchbase_companies_1.id AS crunchbase_companies_1_id, 
       crunchbase_companies_1.domain AS crunchbase_companies_1_domain 
FROM test.crunchbase_companies, 
     test.crunchbase_people 
         LEFT OUTER JOIN (
             test.crunchbase_people_crunchbase_companies AS crunchbase_people_crunchbase_companies_1 
                 JOIN test.crunchbase_companies AS crunchbase_companies_1 
                 ON crunchbase_companies_1.id = crunchbase_people_crunchbase_companies_1.crunchbase_companies_id) 
             ON test.crunchbase_people.id = crunchbase_people_crunchbase_companies_1.crunchbase_people_id 
WHERE test.crunchbase_people.email IS NULL AND test.crunchbase_people.name IS NOT NULL AND test.crunchbase_companies.domain IS NOT NULL AND (test.crunchbase_companies.is_domain_catch_all = false OR test.crunchbase_companies.is_domain_catch_all IS NULL)

Which ends up never completing, and if I run it inside Postgres console it just returns duplicates of the exact same row, over and over again!

So it never gets to mapping the objects because the query simply takes forever. There is an easy way of doing this without ORM, and the query runs in 0.5 seconds (and no duplicates like above), but then my objects are not mapped which would cause me to refactor a lot of my code.

Does anyone know what could be the problem with such a query?

khelwood
  • 55,782
  • 14
  • 81
  • 108
Taras
  • 181
  • 6
  • No join condition for the implicit join? – jarlh Jan 04 '22 at 18:59
  • You have misunderstood what `joinedload` is for, please see https://stackoverflow.com/questions/7793842/how-to-filter-by-joinloaded-table-in-sqlalchemy and https://stackoverflow.com/questions/47243397/sqlalchemy-joinedload-filter-column. Using "company" in `where()` without having defined an explicit join adds it to the FROM clause as is, resulting in a cross join. – Ilja Everilä Jan 04 '22 at 19:08
  • @IljaEverilä self.orm_contact_class.__dict__[self.access_from_contact_to_company] is where there is an explicit join, access_from_contact_to_company is a string – Taras Jan 04 '22 at 19:56
  • @jarlh self.orm_contact_class.__dict__[self.access_from_contact_to_company] it's from this line – Taras Jan 04 '22 at 19:56
  • 2
    By explicit join I mean using `Query.join()`. Using `joinedload` you are telling the query to eagerly fill the relationship attribute using a join. SQLAlchemy makes sure that those joins are as transparent as possible, i.e. have no other effect than allowing the attributes to be filled. It will not for example be referencable in the `where()`. Instead when you have `self.orm_company_class.domain != None` you end up with an additional cross join of all companies with a non-null domain. – Ilja Everilä Jan 04 '22 at 20:04
  • got it, thank you. i have ended up jost using sql alchemy core to solve the problem. – Taras Jan 04 '22 at 21:32

1 Answers1

2

As suggested by the commentators on the original post, I was missing an explicit join as joinedload doesn't replace it, but merely tells which fields should be pre-loaded.

(
            self.db.query(self.orm_contact_class)
            .join(self.orm_contact_class.__dict__[self.access_from_contact_to_company]) # added this line
            .options(
                load_only(
                    self.orm_contact_class.id,
                    self.orm_contact_class.name,
                    self.orm_contact_class.email_attempts_dict,
                ),
                joinedload(
                    self.orm_contact_class.__dict__[self.access_from_contact_to_company]
                ).load_only(self.orm_company_class.domain)
            )
            .where(
                self.orm_contact_class.email == None,
                self.orm_contact_class.name != None,
                self.orm_company_class.domain != None,
                catch_all_conditions
            )
        )
Taras
  • 181
  • 6
  • One more note, if your intent was indeed to limit the preloaded companies to those with non-null domain etc., instead of using companies to limit the set of people, then you'll want to use an explicit join combined with the `contains_eager` option (instead of `joinedload`). – Ilja Everilä Jan 05 '22 at 07:34
  • 1
    Yes, I did end up using contains_eager :) – Taras Jan 06 '22 at 22:24