So is it right to say any record in result of this query is in
"Intersect" set between "Customers & Employees" "AND" any record that
exist in "Intersect" set between "Customers & Employees" is in result
of this query too?
YES.
... Yes, but it won't be as efficient because you are filtering out duplicates three times instead of once. In your query you're
- Using DISTINCT to pull unique records from employees
- Using DISTINCT to pull unique records from customers
- Combining both queries using UNION ALL
- Using GROUP BY in your outer query to to filter the records you retrieved in steps 1,2 and 3.
Using INTERSECT will return identical results but more efficiently. To see for yourself you can create the sample data below and run both queries:
use tempdb
go
if object_id('dbo.customers') is not null drop table dbo.customers;
if object_id('dbo.employees') is not null drop table dbo.employees;
create table dbo.customers
(
customerId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
create table dbo.employees
(
employeeId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
insert dbo.customers(country, region, city)
values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
insert dbo.employees
values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');
Run these queries:
SELECT D.Country, D.Region, D.City
FROM
(
SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
SELECT Country, Region, City
FROM dbo.customers
INTERSECT
SELECT Country, Region, City
FROM dbo.employees;
Results:
Country Region City
----------- ---------- ----------
us Midwest Chicago
Country Region City
----------- ---------- ----------
us Midwest Chicago
If using INTERSECT is not an option OR you want a faster query you could improve the query you posted a couple different ways, such as:
Option 1: let GROUP BY handle ALL the de-duplication like this:
This is the same as what you posted but without the DISTINCTS
SELECT D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
Option 2: Use ROW_NUMBER
This would be my preference and will likely be most efficient
SELECT Country, Region, City
FROM
(
SELECT
rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)),
D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
) uniquify
WHERE rn = 2;