Wrong plan when inner-joining a view/subquery that has left join

Question

I'm trying to build a query that inner joins a view (which exists for reusability), but apparently the fact that this view has an internal left join is somehow messing up the optimizer, and I can't really understand why (indices statistics are updated).

Below is an MCVE. It's actually very simple. You can picture it as a simple customer (B) - order (C) design where customer's address (optional) is in another table (A). And then we have a view to join the customer to it's address (vw_B).

Metadata and example data:

create table A (
    id int not null,
    fieldA char(10) not null,

    constraint pk_A primary key (id)
);

create table B (
    id int not null,
    fieldB char(10) not null,
    idA int,

    constraint pk_B primary key (id),
    constraint fk_A foreign key (idA) references A (id)
);

create view VW_B as
    select b.*, a.fieldA from B
    left join A on a.id = b.idA;

create table C (
    id int not null,
    mydate date not null,
    idB int not null,

    constraint pk_C primary key (id),
    constraint fk_B foreign key (idB) references B (id)
);
create index ix_C on C (mydate);

insert into A (id, fieldA)
with recursive n as (
    select 1 as n from rdb$database
    union all
    select n.n + 1 from n
    where n < 10
)
select n.n, 'A' from n;
SET STATISTICS INDEX PK_A;

insert into B (id, fieldB, idA)
with recursive n as (
    select 1 as n from rdb$database
    union all
    select n.n + 1 from n
    where n < 100
)
select n.n, 'B', IIF(MOD(n.n, 5) = 0, null, MOD(n.n, 10)+1) from n;
SET STATISTICS INDEX PK_B;
SET STATISTICS INDEX FK_A;

insert into C (id, mydate, idB)
with recursive n as (
    select 1 as n from rdb$database
    union all
    select n.n + 1 from n
    where n < 1000
)
select n.n, cast('01.01.2020' as date) + 100*rand(), mod(n.n, 100)+1 from n;
SET STATISTICS INDEX PK_C;
SET STATISTICS INDEX FK_B;
SET STATISTICS INDEX IX_C;

With this design, I want to have a query that can join all tables in such a way that I can efficiently search orders by date (c.mydate) or any indexed customer information (table B). The obvious choice is an inner join between B and C, and it works fine. But if I want to add customer's address to the result, by using vw_B instead of B, the optimizer no longer selects the best plan.

Here are some queries to show this:

Manually joining all tables and filtering by date. Optimizer works fine.

select c.*, b.fieldB, a.fieldA from C
inner join B on b.id = c.idB
left join A on a.id = b.idA
where c.mydate = '01.01.2020'

PLAN JOIN (JOIN (C INDEX (IX_C), B INDEX (PK_B)), A INDEX (PK_A))

Reusing vw_B to have A table joined automatically. Optimizer selects a NATURAL plan on (VW_B B).

select c.*, b.fieldB, b.fieldA from C
inner join VW_B b on b.id = c.idB
where c.mydate = '01.01.2020'

PLAN JOIN (JOIN (B B NATURAL, B A INDEX (PK_A)), C INDEX (FK_B, IX_C))

Why does that happen? I thought these two queries should produce the exact same operation in the engine. Now, this is a very simple MVCE, and I have much more complex views that are very reusable, and with larger tables joining with those views is causing performance issues.

Do you have any suggestions to improve performance/PLAN selection, but preserving the convenience of reusability that views provide?

Server version is WI-V3.0.4.33054.

The queries aren't exactly equivalent it is more like (shortened) `select * from C inner join (select * from B left join A on a.id = b.idA) x on x.id = c.idB`, this also changes how the optimizer handles it. BTW: Which Firebird version are you using? — Mark Rotteveel, Apr 24 '20 at 07:36
@MarkRotteveel ops, sorry. Edited the question to add version: `WI-V3.0.4.33054`. Yes, I'd imagine it would be more like a subquery. In fact, the subquery has the same wrong plan problem. But I don't understand why the subquery/view version can't be optimized to the same plan as the "direct joins" version. — GabrielF, Apr 24 '20 at 18:52
One of the problems is that the Firebird optimizer just isn't that good. — Mark Rotteveel, Apr 25 '20 at 09:55
@MarkRotteveel _optimizer just isn't that good_ This is correct answer for this questiion:). To be more specific: outer joins are not handled very well in firebird (a great example: [CORE-1239](http://tracker.firebirdsql.org/browse/CORE-1239)). From my experience outer joins in views are always evaluated separately so the optimizer cannot change the order of inner joins in the OP. If there are no outer joins in the view - both queries should have the same plans. — BrakNicku, Apr 25 '20 at 14:25
Is there any general solution? I don't think so. Performance in the Q can be improved by changing inner join to left outer join (adding not null predicate if the join type matters). I also often use subqueries instead of left joins in views when there is only one field needed from the joined table. — BrakNicku, Apr 25 '20 at 14:31
@BrakNicku I'll probably end up rewriting the joins inside the view in each query, hurting reusability. The performance difference for average sized tables is too much to ignore. Maybe I should only create views that can be inner joined, if they are meant to be reusable. — GabrielF, Apr 25 '20 at 15:41
@GabrielF Firebird is a free engine and it can handle huge databases pretty well, but to get the best results it often requires extra work to improve performance of the most important queries in your system and there is no one simple method to do that. — BrakNicku, Apr 25 '20 at 16:00

score 1 · Accepted Answer · answered Apr 26 '20 at 10:39

The Firebird optimizer is not intelligent enough to consider the queries equivalent.

Your query with view is equivalent to:

select c.*, b.fieldB, a.fieldA from C
inner join (B left join A on a.id = b.idA)
on b.id = c.idB
where c.mydate = '01.01.2020'

This will produce (almost) the same plan. So, the problem is not with the use of views or not itself, but with how table expressions are nested. This changes how they are evaluated by the engine, and which reordering of joins the engine thinks are possible.

As BrakNicku indicated in the comments, there is no general solution for this.

Wrong plan when inner-joining a view/subquery that has left join

1 Answers1