I'm trying to build a query that inner joins a view (which exists for reusability), but apparently the fact that this view has an internal left join is somehow messing up the optimizer, and I can't really understand why (indices statistics are updated).
Below is an MCVE. It's actually very simple. You can picture it as a simple customer (B) - order (C) design where customer's address (optional) is in another table (A). And then we have a view to join the customer to it's address (vw_B).
Metadata and example data:
create table A (
id int not null,
fieldA char(10) not null,
constraint pk_A primary key (id)
);
create table B (
id int not null,
fieldB char(10) not null,
idA int,
constraint pk_B primary key (id),
constraint fk_A foreign key (idA) references A (id)
);
create view VW_B as
select b.*, a.fieldA from B
left join A on a.id = b.idA;
create table C (
id int not null,
mydate date not null,
idB int not null,
constraint pk_C primary key (id),
constraint fk_B foreign key (idB) references B (id)
);
create index ix_C on C (mydate);
insert into A (id, fieldA)
with recursive n as (
select 1 as n from rdb$database
union all
select n.n + 1 from n
where n < 10
)
select n.n, 'A' from n;
SET STATISTICS INDEX PK_A;
insert into B (id, fieldB, idA)
with recursive n as (
select 1 as n from rdb$database
union all
select n.n + 1 from n
where n < 100
)
select n.n, 'B', IIF(MOD(n.n, 5) = 0, null, MOD(n.n, 10)+1) from n;
SET STATISTICS INDEX PK_B;
SET STATISTICS INDEX FK_A;
insert into C (id, mydate, idB)
with recursive n as (
select 1 as n from rdb$database
union all
select n.n + 1 from n
where n < 1000
)
select n.n, cast('01.01.2020' as date) + 100*rand(), mod(n.n, 100)+1 from n;
SET STATISTICS INDEX PK_C;
SET STATISTICS INDEX FK_B;
SET STATISTICS INDEX IX_C;
With this design, I want to have a query that can join all tables in such a way that I can efficiently search orders by date (c.mydate) or any indexed customer information (table B). The obvious choice is an inner join between B and C, and it works fine. But if I want to add customer's address to the result, by using vw_B instead of B, the optimizer no longer selects the best plan.
Here are some queries to show this:
Manually joining all tables and filtering by date. Optimizer works fine.
select c.*, b.fieldB, a.fieldA from C
inner join B on b.id = c.idB
left join A on a.id = b.idA
where c.mydate = '01.01.2020'
PLAN JOIN (JOIN (C INDEX (IX_C), B INDEX (PK_B)), A INDEX (PK_A))
Reusing vw_B to have A table joined automatically. Optimizer selects a NATURAL plan on (VW_B B).
select c.*, b.fieldB, b.fieldA from C
inner join VW_B b on b.id = c.idB
where c.mydate = '01.01.2020'
PLAN JOIN (JOIN (B B NATURAL, B A INDEX (PK_A)), C INDEX (FK_B, IX_C))
Why does that happen? I thought these two queries should produce the exact same operation in the engine. Now, this is a very simple MVCE, and I have much more complex views that are very reusable, and with larger tables joining with those views is causing performance issues.
Do you have any suggestions to improve performance/PLAN selection, but preserving the convenience of reusability that views provide?
Server version is WI-V3.0.4.33054
.