7

I have a complex business logic that requires me to perform a 2-levels nested query. The queries are generated by Django's ORM. At the bottom of the question I'll provide the queries as-is as well as a full EXPLAIN suitable to be viewed with PEV2, but in order to help readers understand better the question, I'll start with a more conceptual explanation.

This is how a very naive description of what we're doing looks like:

some_ids = get_id_based_on_some_conditions(*conditions*)
some_other_ids = get_some_other_ids_based_on_some_conditions_and_filtering_by_some_ids(*other_conditions*, some_ids)
results = get_results_based_on_even_more_conditions_and_filtering_by_some_other_ids(*another_set_of_conditions*, some_other_ids)

Translating the following pseudo-code to actual SQL using subqueries is quite easy. A straightforward translation becomes into the following pseudo-query:

select 
    foo,
    bar
from
    t1,
    t2
where
    condition1 = something and
    condition2 in (   <---- first level subquery
        select 
            id
        from
            t3
        where
            condition3 = another_something and
            condition4 in (    <---- second level subquery
                select
                    another_id
                from
                    t4
                where
                    condition5 = something_something and
                    condition6 = another_something_something
            )
    )

Since the query is taking a considerable amount of time (~0.6s) given the number of rows that it returns (a little bit over 9.000), I thought that it might help replacing the second level subquery with an inner join.

That, in fact, made the query even slower (now at ~1.7s). So I thought that maybe the planner didn't correctly understood what would happen with a subquery with an inner join inside and made some serious miscalculations / overestimations / underestimations, so I replaced the first level subquery with more inner joins, which led to even poorer results (now at ~10s).

I have been analysing the EXPLAINS of the queries for hours, and I can't figure out why using inner joins makes everything slower. I also don't know how to tell if my (currently) best query is actually the best I can get or if there are things that I'm not doing and that might speed it up.

So, the questions that I have are:

  • why are the inner joins slower than subqueries?
  • how can I tell if I'm doing everything possible in order to squeeze the maximum performance out of my database or if I'm missing something?

Actual queries and EXPLAINS as-is:

Query with 2-levels subqueries:

SELECT DISTINCT
    "phdrug_phdrug"."id",
    "phdrug_phdrug"."uuid",
    "phdrug_phdrug"."default_description",
    "phdrug_phdrug"."alternative_description",
    "phdrug_phdrug"."ean",
    "phdrug_phdrug"."mirror_ean",
    "phdrug_phdrug"."parent_ean",
    "phdrug_phdrug"."reg_num",
    "phdrug_phdrug"."medika_code",
    "phdrug_phdrug"."atc_iv",
    "phdrug_phdrug"."product_type",
    "phdrug_phdrug"."fraction",
    "phdrug_phdrug"."active",
    "phdrug_phdrug"."loyal",
    "phdrug_phdrug"."patent",
    "phdrug_phdrug"."chronics",
    "phdrug_phdrug"."recipe",
    "phdrug_phdrug"."deal",
    "phdrug_phdrug"."specialized",
    "phdrug_phdrug"."armored",
    "phdrug_phdrug"."top_hight_speciality",
    "phdrug_phdrug"."top_generic",
    "phdrug_phdrug"."hight_speciality",
    "phdrug_phdrug"."temp_8_15",
    "phdrug_phdrug"."temp_15_25",
    "phdrug_phdrug"."temp_2_8",
    "phdrug_phdrug"."temp_less_15",
    "phdrug_phdrug"."new",
    "phdrug_phdrug"."mdk_internal_code",
    "phdrug_phdrug"."mdk_single_id",
    "phdrug_phdrug"."mdk_object_id",
    "phdrug_phdrug"."is_from_mdk_db",
    "phdrug_phdrug"."top",
    "phdrug_phdrug"."laboratory_name",
    "phdrug_phdrug"."laboratory_alternative_name",
    "phdrug_phdrug"."imported",
    "phdrug_phdrug"."imported_country",
    "phdrug_phdrug"."laboratory_id",
    "phdrug_phdrug"."specialty",
    "phdrug_phdrug"."dimension_id",
    "phdrug_phdrug"."featured",
    "phdrug_phdrug"."top_ae_rank",
    "phdrug_phdrug"."top_farma_rank" 
FROM
    "phdrug_phdrug"
    INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
    INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" ) 
WHERE
    (
        "monetary_drugprice"."id" IN (
        SELECT
            V0."id" 
        FROM
            "monetary_drugprice" V0 
        WHERE
            (
                V0."pricelist_id" IN (
                SELECT DISTINCT ON
                    ( U0."id" ) U0."id" 
                FROM
                    "monetary_pricelist" U0
                    INNER JOIN "monetary_pricelistdestinations" U1 ON ( U0."id" = U1."pricelist_id" )
                    INNER JOIN "organization_organization" U2 ON ( U0."manager_id" = U2."id" )
                    INNER JOIN "courier_carrier_pricelists" U3 ON ( U0."id" = U3."pricelist_id" )
                    INNER JOIN "courier_carrier" U4 ON ( U3."carrier_id" = U4."id" )
                    INNER JOIN "courier_carrierdelivery" U5 ON ( U4."id" = U5."carrier_id" )
                    INNER JOIN "monetary_pricelistcountry" U6 ON ( U0."id" = U6."pricelist_id" ) 
                WHERE
                    (
                        (
                            U0."expires" = FALSE 
                            OR (
                                U0."expires" = TRUE 
                                AND ( U0."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01' 
                                AND ( U0."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01' 
                            ) 
                        ) 
                        AND U0."active" = TRUE 
                        AND U1."to_public" = TRUE 
                        AND U2."organization_type" = 2 
                        AND (
                            U5."dst_country" = 'MX' 
                            OR U5."ignore_country_filter" = TRUE 
                        ) 
                        AND U6."country" = 'MX' 
                        AND U2."active" = TRUE 
                    ) 
                ) 
                AND V0."stock" > 0 
            ) 
        ) 
        AND "phdrug_phdrug"."active" = TRUE 
        AND "phdrug_phdrugpicture"."is_main" = TRUE 
    ) 
ORDER BY
    "phdrug_phdrug"."id" ASC,
    "phdrug_phdrug"."default_description" ASC

Full explain: https://pastebin.com/jDy3FyKp

Query with 1-level subquery:

SELECT DISTINCT
    "phdrug_phdrug"."id",
    "phdrug_phdrug"."uuid",
    "phdrug_phdrug"."default_description",
    "phdrug_phdrug"."alternative_description",
    "phdrug_phdrug"."ean",
    "phdrug_phdrug"."mirror_ean",
    "phdrug_phdrug"."parent_ean",
    "phdrug_phdrug"."reg_num",
    "phdrug_phdrug"."medika_code",
    "phdrug_phdrug"."atc_iv",
    "phdrug_phdrug"."product_type",
    "phdrug_phdrug"."fraction",
    "phdrug_phdrug"."active",
    "phdrug_phdrug"."loyal",
    "phdrug_phdrug"."patent",
    "phdrug_phdrug"."chronics",
    "phdrug_phdrug"."recipe",
    "phdrug_phdrug"."deal",
    "phdrug_phdrug"."specialized",
    "phdrug_phdrug"."armored",
    "phdrug_phdrug"."top_hight_speciality",
    "phdrug_phdrug"."top_generic",
    "phdrug_phdrug"."hight_speciality",
    "phdrug_phdrug"."temp_8_15",
    "phdrug_phdrug"."temp_15_25",
    "phdrug_phdrug"."temp_2_8",
    "phdrug_phdrug"."temp_less_15",
    "phdrug_phdrug"."new",
    "phdrug_phdrug"."mdk_internal_code",
    "phdrug_phdrug"."mdk_single_id",
    "phdrug_phdrug"."mdk_object_id",
    "phdrug_phdrug"."is_from_mdk_db",
    "phdrug_phdrug"."top",
    "phdrug_phdrug"."laboratory_name",
    "phdrug_phdrug"."laboratory_alternative_name",
    "phdrug_phdrug"."imported",
    "phdrug_phdrug"."imported_country",
    "phdrug_phdrug"."laboratory_id",
    "phdrug_phdrug"."specialty",
    "phdrug_phdrug"."dimension_id",
    "phdrug_phdrug"."featured",
    "phdrug_phdrug"."top_ae_rank",
    "phdrug_phdrug"."top_farma_rank" 
FROM
    "phdrug_phdrug"
    INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
    INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" ) 
WHERE
    (
        "monetary_drugprice"."id" IN (
        SELECT
            U0."id" 
        FROM
            "monetary_drugprice" U0
            INNER JOIN "monetary_pricelist" U1 ON ( U0."pricelist_id" = U1."id" )
            INNER JOIN "monetary_pricelistdestinations" U2 ON ( U1."id" = U2."pricelist_id" )
            INNER JOIN "organization_organization" U3 ON ( U1."manager_id" = U3."id" )
            INNER JOIN "courier_carrier_pricelists" U4 ON ( U1."id" = U4."pricelist_id" )
            INNER JOIN "courier_carrier" U5 ON ( U4."carrier_id" = U5."id" )
            INNER JOIN "courier_carrierdelivery" U6 ON ( U5."id" = U6."carrier_id" )
            INNER JOIN "monetary_pricelistcountry" U7 ON ( U1."id" = U7."pricelist_id" ) 
        WHERE
            (
                (
                    U1."expires" = FALSE 
                    OR (
                        U1."expires" = TRUE 
                        AND ( U1."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01' 
                        AND ( U1."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01' 
                    ) 
                ) 
                AND U1."active" = TRUE 
                AND U2."to_public" = TRUE 
                AND U3."organization_type" = 2 
                AND (
                    U6."dst_country" = 'MX' 
                    OR U6."ignore_country_filter" = TRUE 
                ) 
                AND U7."country" = 'MX' 
                AND U3."active" = TRUE 
                AND U0."stock" > 0 
            ) 
        ) 
        AND "phdrug_phdrug"."active" = TRUE 
        AND "phdrug_phdrugpicture"."is_main" = TRUE 
    ) 
ORDER BY
    "phdrug_phdrug"."id" ASC,
    "phdrug_phdrug"."default_description" ASC

Full explain: https://pastebin.com/NidTZMxY

Query with only inner joins:

SELECT DISTINCT
    "phdrug_phdrug"."id",
    "phdrug_phdrug"."uuid",
    "phdrug_phdrug"."default_description",
    "phdrug_phdrug"."alternative_description",
    "phdrug_phdrug"."ean",
    "phdrug_phdrug"."mirror_ean",
    "phdrug_phdrug"."parent_ean",
    "phdrug_phdrug"."reg_num",
    "phdrug_phdrug"."medika_code",
    "phdrug_phdrug"."atc_iv",
    "phdrug_phdrug"."product_type",
    "phdrug_phdrug"."fraction",
    "phdrug_phdrug"."active",
    "phdrug_phdrug"."loyal",
    "phdrug_phdrug"."patent",
    "phdrug_phdrug"."chronics",
    "phdrug_phdrug"."recipe",
    "phdrug_phdrug"."deal",
    "phdrug_phdrug"."specialized",
    "phdrug_phdrug"."armored",
    "phdrug_phdrug"."top_hight_speciality",
    "phdrug_phdrug"."top_generic",
    "phdrug_phdrug"."hight_speciality",
    "phdrug_phdrug"."temp_8_15",
    "phdrug_phdrug"."temp_15_25",
    "phdrug_phdrug"."temp_2_8",
    "phdrug_phdrug"."temp_less_15",
    "phdrug_phdrug"."new",
    "phdrug_phdrug"."mdk_internal_code",
    "phdrug_phdrug"."mdk_single_id",
    "phdrug_phdrug"."mdk_object_id",
    "phdrug_phdrug"."is_from_mdk_db",
    "phdrug_phdrug"."top",
    "phdrug_phdrug"."laboratory_name",
    "phdrug_phdrug"."laboratory_alternative_name",
    "phdrug_phdrug"."imported",
    "phdrug_phdrug"."imported_country",
    "phdrug_phdrug"."laboratory_id",
    "phdrug_phdrug"."specialty",
    "phdrug_phdrug"."dimension_id",
    "phdrug_phdrug"."featured",
    "phdrug_phdrug"."top_ae_rank",
    "phdrug_phdrug"."top_farma_rank" 
FROM
    "phdrug_phdrug"
    INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
    INNER JOIN "monetary_pricelist" ON ( "monetary_drugprice"."pricelist_id" = "monetary_pricelist"."id" )
    INNER JOIN "monetary_pricelistdestinations" ON ( "monetary_pricelist"."id" = "monetary_pricelistdestinations"."pricelist_id" )
    INNER JOIN "organization_organization" ON ( "monetary_pricelist"."manager_id" = "organization_organization"."id" )
    INNER JOIN "courier_carrier_pricelists" ON ( "monetary_pricelist"."id" = "courier_carrier_pricelists"."pricelist_id" )
    INNER JOIN "courier_carrier" ON ( "courier_carrier_pricelists"."carrier_id" = "courier_carrier"."id" )
    INNER JOIN "courier_carrierdelivery" ON ( "courier_carrier"."id" = "courier_carrierdelivery"."carrier_id" )
    INNER JOIN "monetary_pricelistcountry" ON ( "monetary_pricelist"."id" = "monetary_pricelistcountry"."pricelist_id" )
    INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" ) 
WHERE
    (
        (
            "monetary_pricelist"."expires" = FALSE 
            OR (
                "monetary_pricelist"."expires" = TRUE 
                AND ( "monetary_pricelist"."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01' 
                AND ( "monetary_pricelist"."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01' 
            ) 
        ) 
        AND "monetary_pricelist"."active" = TRUE 
        AND "monetary_pricelistdestinations"."to_public" = TRUE 
        AND "organization_organization"."organization_type" = 2 
        AND (
            "courier_carrierdelivery"."dst_country" = 'MX' 
            OR "courier_carrierdelivery"."ignore_country_filter" = TRUE 
        ) 
        AND "monetary_pricelistcountry"."country" = 'MX' 
        AND "organization_organization"."active" = TRUE 
        AND "monetary_drugprice"."stock" > 0 
        AND "phdrug_phdrug"."active" = TRUE 
        AND "phdrug_phdrugpicture"."is_main" = TRUE 
    ) 
ORDER BY
    "phdrug_phdrug"."id" ASC,
    "phdrug_phdrug"."default_description" ASC

Full explain: https://pastebin.com/DaVztBuV

alexandernst
  • 14,352
  • 22
  • 97
  • 197
  • 3
    `join` is slower then `in` because the server have to build joined data. Tip `exists` is more efficient then `in` in this case. – Abelisto May 01 '20 at 23:09
  • @Abelisto I just tried `exists` and it's actually making things worse (~17s). Mind to explain why `exists` would help in this case? (and maybe how to use it, as I might be using it incorrectly) – alexandernst May 01 '20 at 23:37
  • 1
    Please [use text, not images/links, for text--including tables & ERDs](https://meta.stackoverflow.com/q/285551/3404097). Paraphrase or quote from other text. Give just what you need & relate it to your problem. Use images only for what cannot be expressed as text or to augment text. Images cannot be searched for or cut & pasted. Include a legend/key & explanation with an image. Also, links die. Insert images/links using edit functions. Make your post self-contained. – philipxy May 01 '20 at 23:40
  • 1
    Please in code questions give a [mre]--including cut & paste & runnable minimal code & minimal representative data given as code. For SQL that includes DBMS & DDL, including constraints, indexes & tabular initialization. For SQL performance that includes EXPLAIN results & statistics. Please research & summarize. For SQL that includes basics of optimization/performance--immediately leading to indexes, plans, statistics & SARGability. [Tips for asking a good SQL question](https://meta.stackoverflow.com/a/271056/3404097) Ask re optimization after you have learned & applied those basics. [ask] – philipxy May 01 '20 at 23:41
  • 1
    @philipxy I haven't linked any images. I linked only the output of the full explains because these literally don't fit here (because of the text length limit). As for the second part, I really can't provide that. I'm asking a theoretical question and I'm providing the real queries just to make things easier to understand, I'm not expecting anybody to run them. Providing a minimal reproducible example would require a quite big dump of my database (look at how many tables my queries are using!), which is imho not viable. – alexandernst May 01 '20 at 23:51
  • @alexandernst Here is the short explanation: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=d12c3cd687d5d9ea06fc13d5db0be548 – Abelisto May 02 '20 at 00:00
  • I'd advise you to select only the fields you are going to need (as a filter or to be shown), before joining tables. Every table you add is bringing all possible combinations of records, constrained by join conditions, with all the available fields, which also have a memory expense. If you can't reduce the joint quantity, please try to reduce the data needed in the calculation. And add indexes, when possible, based on those required fields. – Ictus May 02 '20 at 04:05
  • @Ictus I’m already selecting only the IDs in the subqueries. Also, as I mentioned in my question, I can’t see any missing indexes , but that might be me not understanding correctly the data from EXPLAIN. Is there any way to tell if there are actually any missing indexes? – alexandernst May 02 '20 at 08:18
  • DISTINCT when selecting from a single table. -->> exists() is your friend. – wildplasser May 02 '20 at 15:45
  • If your question needs your code, give it properly, otherwise if it doesn't then don't give your code. The "theoretical" version is 1. is unresearched 2. a faq & 3. doesn't apply to specific code. Clearly ask whatever question you are trying to ask--per my comment. My comment mentions images as well as links because it is prefab, and it doesn't say you use images right now, but as long as you are inappropriately using links it's reasonable to tell you to not use either links or images inappropriately. – philipxy May 02 '20 at 23:50
  • @philipxy The real query is required (as well as the full EXPLAIN that I attached in pastebin) because that's what PEV requires in order to analyse and generate the visual output. – alexandernst May 03 '20 at 10:08

1 Answers1

0

Troubleshooting at this level is difficult without seeing the database structure. I've had to write two different versions of the same script because the environments were different at the different sites.

  1. Make sure the tables are properly indexed.
  2. Make your subscript as part of the FROM statement and not the WHERE statement, unless its part of the IN clause.

Select * from Table1 t1 left outer join (Select * from Table2) t2 on t1.field = t2.field

  1. If its a large pull and/or large heavy used tables, then using temp tables will speed it as well. But it looks like your script is smaller and this is over kill.
Duane
  • 36
  • 4