I have a sql view, which I'm using to retrieve data. Lets say its a large list of products, which are linked to the customers who have bought them. The view should return only one row per product, no matter how many customers it is linked to. I'm using the row_number function to achieve this. (This example is simplified, the generic situation would be a query where there should only be one row returned for each unique value of some column X. Which row is returned is not important)
CREATE VIEW productView AS
SELECT * FROM
(SELECT
Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
customer.Id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
) as temp
WHERE temp.prodcut_numbering = 1
Now lets say that the total number of rows in this view is ~1 million, and running select * from productView takes 10 seconds. Performing a query such as select * from productView where productID = 10 takes the same amount of time. I believe this is because the query gets evaluated to this
SELECT * FROM
(SELECT
Row_number() OVER(PARTITION BY products.Id ORDER BY products.Id) AS product_numbering,
customer.Id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
) as temp
WHERE prodcut_numbering = 1 and prodcut.Id = 10
I think this is causing the inner subquery to be evaluated in full each time. Ideally I'd like to use something along the following lines
SELECT
Row_number() OVER(PARTITION BY products.productID ORDER BY products.productID) AS product_numbering,
customer.id
//various other columns
FROM products
LEFT OUTER JOIN customer ON customer.productId = prodcut.Id
//various other joins
WHERE prodcut_numbering = 1
But this doesn't seem to be allowed. Is there any way to do something similar?
EDIT -
After much experimentation, the actual problem I believe I am having is how to force a join to return exactly 1 row. I tried to use outer apply, as suggested below. Some sample code.
CREATE TABLE Products (id int not null PRIMARY KEY)
CREATE TABLE Customers (
id int not null PRIMARY KEY,
productId int not null,
value varchar(20) NOT NULL)
declare @count int = 1
while @count <= 150000
begin
insert into Customers (id, productID, value)
values (@count,@count/2, 'Value ' + cast(@count/2 as varchar))
insert into Products (id)
values (@count)
SET @count = @count + 1
end
CREATE NONCLUSTERED INDEX productId ON Customers (productID ASC)
With the above sample set, the 'get everything' query below
select * from Products
outer apply (select top 1 *
from Customers
where Products.id = Customers.productID) Customers
takes ~1000ms to run. Adding an explicit condition:
select * from Products
outer apply (select top 1 *
from Customers
where Products.id = Customers.productID) Customers
where Customers.value = 'Value 45872'
Takes some identical amount of time. This 1000ms for a fairly simple query is already too much, and scales the wrong way (upwards) when adding additional similar joins.