Im a begginer. I know indexes are necessary for performance boosts, but i want to know how they actually work behind the scenes. Beforehand, I used to think that we should make indexes on those columns which are included in where clause (which I realized is wrong)
For example, SELECT * from MARKS where marks_obtained > 50
Consider that there's a clustered index on primary key of this table and I created a non-clustered index on marks_obtained column as its there in my where clause.
My perception: So the leaf nodes will be containing pointers to clustered index and as clustered index points to actual rows, it will select entire rows (due to asteric in my query)
Scenario
I came across following query (from AdventureWorks DB on which a non-clustered index was created) which works fine and took less than a second to execute 3200000 rows until a new column was inserted into it:
Query
SELECT x.*
INTO#X
FROM dbo.bigProduct AS p
CROSS APPLY
(
SELECT TOP 1000 *
FROM dbo.bigTransactionHistory AS bth
WHERE
bth.ProductId = p.bth.ProductId
ORDER BY
TransactionDate DESC
) AS x
WHERE
p.ProductId BETWEEN 1000 AND 7500
GO
NEW INSERTED COLUMN
ALTER TABLE dbo.bigTransactionHistory
ADD CustomerId INT NULL
After insertion of above column it took 17 seconds! means 17 times slower. A non-clusered index was now missing CustomerId column in the index. Just after including CustomerId, problem was gone.
Question CustomerId seemed to be the culprit until it was added to the index. BUT HOW???