Tim asked me to demonstrate that, in this scenario, "in+union" and "Exists" are more efficient than a "left join". I made several tests within this scenario. In my tests In/Exists were always better, but in one case where they are the same. The explanation is that you don't want to actually collect data from SubM1 or SubM2 but only to execute a test.
With "exists", Sql Server will scan SubM1 for a match, if it finds it, it stops scanning. With a left join, it will continue to search other matches unless you define a unique index on SubM1 and SubM2. The more you are far from this scenario the more "exists" has a greater advantage.
I prepared this script:
Union and exists have exactly the same execution plan so I omitted one.
I generate about 923.521 in Material and about 850k rows in the other tables.
USE [Database1]
SET NOCOUNT ON;
IF object_ID('[dbo].[Material1]') is not null drop table [dbo].[Material1];
CREATE TABLE [dbo].[Material1]
(
[MaterialCode] [varchar](20) NOT NULL PRIMARY KEY CLUSTERED
)
GO
insert into [Database1].[dbo].[Material1]( [MaterialCode] )
SELECT
(A+b+c+d) Code
from
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U')) A(A)
cross join
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U')) b(b)
cross join
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U')) c(c)
cross join
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U')) d(d)
GO
use DATABASE2;
IF object_ID('[dbo].[Submaterial1]') is not null drop table [dbo].[Submaterial1];
IF object_ID('[dbo].[Submaterial2]') is not null drop table [dbo].[Submaterial2];
CREATE TABLE [dbo].[Submaterial1](
[MaterialCode] [varchar](20) NULL
)
CREATE index IX ON [dbo].[Submaterial1]([MaterialCode]);
GO
insert into DATABASE2.[dbo].[Submaterial1]( [MaterialCode] )
SELECT
(A+b+c+d) Code
from
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U')) A(A)
cross join
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U')) b(b)
cross join
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U')) c(c)
cross join
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('U')) d(d)
GO
CREATE TABLE [dbo].[Submaterial2](
[MaterialCode] [varchar](20) NULL
)
CREATE index IX ON [dbo].[Submaterial2]([MaterialCode]);
insert into DATABASE2.[dbo].[Submaterial2]( [MaterialCode] )
SELECT
(A+b+c+d) Code
from
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U')) A(A)
cross join
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U')) b(b)
cross join
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U')) c(c)
cross join
(values ('0'),('1'),('2'),('3'),('4'),('5'),('6'),('7'),('8'),('9'),('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('J'),('K'),('I'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S')) d(d)
GO
SET NOCOUNT ON;
GO
SET statistics IO ON;
SET statistics time ON;
SELECT distinct M.[MaterialCode]
FROM Database1.[dbo].[Material1] M
LEFT JOIN Database2.dbo.[Submaterial1] SM1
ON M.MaterialCode = SM1.MaterialCode
LEFT JOIN Database2.dbo.[Submaterial2] SM2
ON M.MaterialCode = SM2.MaterialCode
WHERE
SM1.MaterialCode is not null OR SM2.MaterialCode is not null;
;
SELECT [MaterialCode]
FROM Database1.[dbo].[Material1] M
WHERE
exists (SELECT * from Database2.dbo.[Submaterial1] SM1 where SM1.MaterialCode = M.MaterialCode)
OR
exists (SELECT * from Database2.dbo.[Submaterial2] SM2 where SM2.MaterialCode = M.MaterialCode)
You can adjust the script to simulate multiple scenarios.
This script, executed as it is, uses this execution plan and you can see "exists" has a big advantage:
Increasing the number of duplicated rows in SubM1 and SubM2 the advantage is bigger and bigger:
If you create a unique index on SubM1 and SubM2, Left join and Exists have the same performances. This result is expected and matches my explanation on why "exists" is better then a "left join":
