I have a complex business logic that requires me to perform a 2-levels nested query. The queries are generated by Django's ORM. At the bottom of the question I'll provide the queries as-is as well as a full EXPLAIN
suitable to be viewed with PEV2, but in order to help readers understand better the question, I'll start with a more conceptual explanation.
This is how a very naive description of what we're doing looks like:
some_ids = get_id_based_on_some_conditions(*conditions*)
some_other_ids = get_some_other_ids_based_on_some_conditions_and_filtering_by_some_ids(*other_conditions*, some_ids)
results = get_results_based_on_even_more_conditions_and_filtering_by_some_other_ids(*another_set_of_conditions*, some_other_ids)
Translating the following pseudo-code to actual SQL using subqueries is quite easy. A straightforward translation becomes into the following pseudo-query:
select
foo,
bar
from
t1,
t2
where
condition1 = something and
condition2 in ( <---- first level subquery
select
id
from
t3
where
condition3 = another_something and
condition4 in ( <---- second level subquery
select
another_id
from
t4
where
condition5 = something_something and
condition6 = another_something_something
)
)
Since the query is taking a considerable amount of time (~0.6s) given the number of rows that it returns (a little bit over 9.000), I thought that it might help replacing the second level subquery with an inner join.
That, in fact, made the query even slower (now at ~1.7s). So I thought that maybe the planner didn't correctly understood what would happen with a subquery with an inner join inside and made some serious miscalculations / overestimations / underestimations, so I replaced the first level subquery with more inner joins, which led to even poorer results (now at ~10s).
I have been analysing the EXPLAINS
of the queries for hours, and I can't figure out why using inner joins makes everything slower. I also don't know how to tell if my (currently) best query is actually the best I can get or if there are things that I'm not doing and that might speed it up.
So, the questions that I have are:
- why are the inner joins slower than subqueries?
- how can I tell if I'm doing everything possible in order to squeeze the maximum performance out of my database or if I'm missing something?
Actual queries and EXPLAINS
as-is:
Query with 2-levels subqueries:
SELECT DISTINCT
"phdrug_phdrug"."id",
"phdrug_phdrug"."uuid",
"phdrug_phdrug"."default_description",
"phdrug_phdrug"."alternative_description",
"phdrug_phdrug"."ean",
"phdrug_phdrug"."mirror_ean",
"phdrug_phdrug"."parent_ean",
"phdrug_phdrug"."reg_num",
"phdrug_phdrug"."medika_code",
"phdrug_phdrug"."atc_iv",
"phdrug_phdrug"."product_type",
"phdrug_phdrug"."fraction",
"phdrug_phdrug"."active",
"phdrug_phdrug"."loyal",
"phdrug_phdrug"."patent",
"phdrug_phdrug"."chronics",
"phdrug_phdrug"."recipe",
"phdrug_phdrug"."deal",
"phdrug_phdrug"."specialized",
"phdrug_phdrug"."armored",
"phdrug_phdrug"."top_hight_speciality",
"phdrug_phdrug"."top_generic",
"phdrug_phdrug"."hight_speciality",
"phdrug_phdrug"."temp_8_15",
"phdrug_phdrug"."temp_15_25",
"phdrug_phdrug"."temp_2_8",
"phdrug_phdrug"."temp_less_15",
"phdrug_phdrug"."new",
"phdrug_phdrug"."mdk_internal_code",
"phdrug_phdrug"."mdk_single_id",
"phdrug_phdrug"."mdk_object_id",
"phdrug_phdrug"."is_from_mdk_db",
"phdrug_phdrug"."top",
"phdrug_phdrug"."laboratory_name",
"phdrug_phdrug"."laboratory_alternative_name",
"phdrug_phdrug"."imported",
"phdrug_phdrug"."imported_country",
"phdrug_phdrug"."laboratory_id",
"phdrug_phdrug"."specialty",
"phdrug_phdrug"."dimension_id",
"phdrug_phdrug"."featured",
"phdrug_phdrug"."top_ae_rank",
"phdrug_phdrug"."top_farma_rank"
FROM
"phdrug_phdrug"
INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" )
WHERE
(
"monetary_drugprice"."id" IN (
SELECT
V0."id"
FROM
"monetary_drugprice" V0
WHERE
(
V0."pricelist_id" IN (
SELECT DISTINCT ON
( U0."id" ) U0."id"
FROM
"monetary_pricelist" U0
INNER JOIN "monetary_pricelistdestinations" U1 ON ( U0."id" = U1."pricelist_id" )
INNER JOIN "organization_organization" U2 ON ( U0."manager_id" = U2."id" )
INNER JOIN "courier_carrier_pricelists" U3 ON ( U0."id" = U3."pricelist_id" )
INNER JOIN "courier_carrier" U4 ON ( U3."carrier_id" = U4."id" )
INNER JOIN "courier_carrierdelivery" U5 ON ( U4."id" = U5."carrier_id" )
INNER JOIN "monetary_pricelistcountry" U6 ON ( U0."id" = U6."pricelist_id" )
WHERE
(
(
U0."expires" = FALSE
OR (
U0."expires" = TRUE
AND ( U0."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01'
AND ( U0."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01'
)
)
AND U0."active" = TRUE
AND U1."to_public" = TRUE
AND U2."organization_type" = 2
AND (
U5."dst_country" = 'MX'
OR U5."ignore_country_filter" = TRUE
)
AND U6."country" = 'MX'
AND U2."active" = TRUE
)
)
AND V0."stock" > 0
)
)
AND "phdrug_phdrug"."active" = TRUE
AND "phdrug_phdrugpicture"."is_main" = TRUE
)
ORDER BY
"phdrug_phdrug"."id" ASC,
"phdrug_phdrug"."default_description" ASC
Full explain: https://pastebin.com/jDy3FyKp
Query with 1-level subquery:
SELECT DISTINCT
"phdrug_phdrug"."id",
"phdrug_phdrug"."uuid",
"phdrug_phdrug"."default_description",
"phdrug_phdrug"."alternative_description",
"phdrug_phdrug"."ean",
"phdrug_phdrug"."mirror_ean",
"phdrug_phdrug"."parent_ean",
"phdrug_phdrug"."reg_num",
"phdrug_phdrug"."medika_code",
"phdrug_phdrug"."atc_iv",
"phdrug_phdrug"."product_type",
"phdrug_phdrug"."fraction",
"phdrug_phdrug"."active",
"phdrug_phdrug"."loyal",
"phdrug_phdrug"."patent",
"phdrug_phdrug"."chronics",
"phdrug_phdrug"."recipe",
"phdrug_phdrug"."deal",
"phdrug_phdrug"."specialized",
"phdrug_phdrug"."armored",
"phdrug_phdrug"."top_hight_speciality",
"phdrug_phdrug"."top_generic",
"phdrug_phdrug"."hight_speciality",
"phdrug_phdrug"."temp_8_15",
"phdrug_phdrug"."temp_15_25",
"phdrug_phdrug"."temp_2_8",
"phdrug_phdrug"."temp_less_15",
"phdrug_phdrug"."new",
"phdrug_phdrug"."mdk_internal_code",
"phdrug_phdrug"."mdk_single_id",
"phdrug_phdrug"."mdk_object_id",
"phdrug_phdrug"."is_from_mdk_db",
"phdrug_phdrug"."top",
"phdrug_phdrug"."laboratory_name",
"phdrug_phdrug"."laboratory_alternative_name",
"phdrug_phdrug"."imported",
"phdrug_phdrug"."imported_country",
"phdrug_phdrug"."laboratory_id",
"phdrug_phdrug"."specialty",
"phdrug_phdrug"."dimension_id",
"phdrug_phdrug"."featured",
"phdrug_phdrug"."top_ae_rank",
"phdrug_phdrug"."top_farma_rank"
FROM
"phdrug_phdrug"
INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" )
WHERE
(
"monetary_drugprice"."id" IN (
SELECT
U0."id"
FROM
"monetary_drugprice" U0
INNER JOIN "monetary_pricelist" U1 ON ( U0."pricelist_id" = U1."id" )
INNER JOIN "monetary_pricelistdestinations" U2 ON ( U1."id" = U2."pricelist_id" )
INNER JOIN "organization_organization" U3 ON ( U1."manager_id" = U3."id" )
INNER JOIN "courier_carrier_pricelists" U4 ON ( U1."id" = U4."pricelist_id" )
INNER JOIN "courier_carrier" U5 ON ( U4."carrier_id" = U5."id" )
INNER JOIN "courier_carrierdelivery" U6 ON ( U5."id" = U6."carrier_id" )
INNER JOIN "monetary_pricelistcountry" U7 ON ( U1."id" = U7."pricelist_id" )
WHERE
(
(
U1."expires" = FALSE
OR (
U1."expires" = TRUE
AND ( U1."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01'
AND ( U1."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01'
)
)
AND U1."active" = TRUE
AND U2."to_public" = TRUE
AND U3."organization_type" = 2
AND (
U6."dst_country" = 'MX'
OR U6."ignore_country_filter" = TRUE
)
AND U7."country" = 'MX'
AND U3."active" = TRUE
AND U0."stock" > 0
)
)
AND "phdrug_phdrug"."active" = TRUE
AND "phdrug_phdrugpicture"."is_main" = TRUE
)
ORDER BY
"phdrug_phdrug"."id" ASC,
"phdrug_phdrug"."default_description" ASC
Full explain: https://pastebin.com/NidTZMxY
Query with only inner joins:
SELECT DISTINCT
"phdrug_phdrug"."id",
"phdrug_phdrug"."uuid",
"phdrug_phdrug"."default_description",
"phdrug_phdrug"."alternative_description",
"phdrug_phdrug"."ean",
"phdrug_phdrug"."mirror_ean",
"phdrug_phdrug"."parent_ean",
"phdrug_phdrug"."reg_num",
"phdrug_phdrug"."medika_code",
"phdrug_phdrug"."atc_iv",
"phdrug_phdrug"."product_type",
"phdrug_phdrug"."fraction",
"phdrug_phdrug"."active",
"phdrug_phdrug"."loyal",
"phdrug_phdrug"."patent",
"phdrug_phdrug"."chronics",
"phdrug_phdrug"."recipe",
"phdrug_phdrug"."deal",
"phdrug_phdrug"."specialized",
"phdrug_phdrug"."armored",
"phdrug_phdrug"."top_hight_speciality",
"phdrug_phdrug"."top_generic",
"phdrug_phdrug"."hight_speciality",
"phdrug_phdrug"."temp_8_15",
"phdrug_phdrug"."temp_15_25",
"phdrug_phdrug"."temp_2_8",
"phdrug_phdrug"."temp_less_15",
"phdrug_phdrug"."new",
"phdrug_phdrug"."mdk_internal_code",
"phdrug_phdrug"."mdk_single_id",
"phdrug_phdrug"."mdk_object_id",
"phdrug_phdrug"."is_from_mdk_db",
"phdrug_phdrug"."top",
"phdrug_phdrug"."laboratory_name",
"phdrug_phdrug"."laboratory_alternative_name",
"phdrug_phdrug"."imported",
"phdrug_phdrug"."imported_country",
"phdrug_phdrug"."laboratory_id",
"phdrug_phdrug"."specialty",
"phdrug_phdrug"."dimension_id",
"phdrug_phdrug"."featured",
"phdrug_phdrug"."top_ae_rank",
"phdrug_phdrug"."top_farma_rank"
FROM
"phdrug_phdrug"
INNER JOIN "monetary_drugprice" ON ( "phdrug_phdrug"."id" = "monetary_drugprice"."drug_id" )
INNER JOIN "monetary_pricelist" ON ( "monetary_drugprice"."pricelist_id" = "monetary_pricelist"."id" )
INNER JOIN "monetary_pricelistdestinations" ON ( "monetary_pricelist"."id" = "monetary_pricelistdestinations"."pricelist_id" )
INNER JOIN "organization_organization" ON ( "monetary_pricelist"."manager_id" = "organization_organization"."id" )
INNER JOIN "courier_carrier_pricelists" ON ( "monetary_pricelist"."id" = "courier_carrier_pricelists"."pricelist_id" )
INNER JOIN "courier_carrier" ON ( "courier_carrier_pricelists"."carrier_id" = "courier_carrier"."id" )
INNER JOIN "courier_carrierdelivery" ON ( "courier_carrier"."id" = "courier_carrierdelivery"."carrier_id" )
INNER JOIN "monetary_pricelistcountry" ON ( "monetary_pricelist"."id" = "monetary_pricelistcountry"."pricelist_id" )
INNER JOIN "phdrug_phdrugpicture" ON ( "phdrug_phdrug"."id" = "phdrug_phdrugpicture"."drug_id" )
WHERE
(
(
"monetary_pricelist"."expires" = FALSE
OR (
"monetary_pricelist"."expires" = TRUE
AND ( "monetary_pricelist"."datestart" AT TIME ZONE'UTC' ) :: DATE <= '2020-05-01'
AND ( "monetary_pricelist"."dateend" AT TIME ZONE'UTC' ) :: DATE >= '2020-05-01'
)
)
AND "monetary_pricelist"."active" = TRUE
AND "monetary_pricelistdestinations"."to_public" = TRUE
AND "organization_organization"."organization_type" = 2
AND (
"courier_carrierdelivery"."dst_country" = 'MX'
OR "courier_carrierdelivery"."ignore_country_filter" = TRUE
)
AND "monetary_pricelistcountry"."country" = 'MX'
AND "organization_organization"."active" = TRUE
AND "monetary_drugprice"."stock" > 0
AND "phdrug_phdrug"."active" = TRUE
AND "phdrug_phdrugpicture"."is_main" = TRUE
)
ORDER BY
"phdrug_phdrug"."id" ASC,
"phdrug_phdrug"."default_description" ASC
Full explain: https://pastebin.com/DaVztBuV