Cannot create CLUSTERED INDEX on a View due to LEFT JOIN or subquery

Question

I have created two options for a query used in a view which return the results I require. I need to rewrite either option so that it can be used in an Indexed View. Both fail when creating a unique clustered index on the view. The first one fails due to the LEFT OUTER JOIN, and the second fails due to a subquery. I believe both will also fail due to a self join.

After finding Creating Indexed Views, there is a large list of TSQL syntax elements that cannot be used. Among them: a derived table, UNION, EXCEPT, INTERSECT, subqueries, Outer or self joins, TOP, ORDER BY, DISTINCT, MAX...

The query should get the largest CompanyID for each unique Company. The StatusName in the Statuses table also needs to be displayed, and I'm only adding that in case it affects the solution. It's currently an INNER JOIN, so it's not causing a problem with creating the index.

Example for the Companies table, with all 3 columns being INT:

CompanyID Company Revision
1         1       1
2         1       2
3         2       1
4         2       2

The query should return:

CompanyID Company Revision
2         1       2
4         2       2

Here are the two options I've created:

SELECT t1.CompanyID, t1.Company, t1.Revision, Statuses.StatusName
FROM dbo.Companies AS t1

LEFT OUTER JOIN dbo.Companies AS t2
ON t1.Company = t2.Company AND t1.CompanyID < t2.CompanyID

INNER JOIN dbo.Statuses
ON dbo.Statuses.StatusID = t1.StatusID

WHERE t2.Company IS NULL

And the other:

SELECT t1.CompanyID, t1.Company, t1.Revision, Statuses.StatusName
FROM dbo.Companies AS t1

INNER JOIN dbo.Statuses
ON dbo.Statuses.StatusID = t1.StatusID

WHERE t1.Company NOT IN (SELECT t2.Company from dbo.Companies AS t2 WHERE t1.CompanyID < t2.CompanyID)

So, my question is, can either query be rewritten to be used in an Indexed View?

I'm using MS SQL Server 2008 R2, and 2005.

Why do you think that the only solution for whatever problem you're having (but that you haven't stated) is an indexed view? Could you state your actual problem instead of telling us why you can't use an indexed view to solve it? — Aaron Bertrand, Mar 12 '12 at 15:05
Sorry, let me clarify. We have limited hardware performance, and are looking for a way to increase query performance, with as little change as possible to the database schema. Non-indexed Views are already in place. There will be 100,000's of records in the base tables (Companies in this example), but typically less than 100 will be returned from any View. It looked like Indexed Views would be a good candidate for a solution if the rather simple queries could be rewritten. — njb, Mar 12 '12 at 15:25
If you've found one way to construct the result set and it's on the "banned" list, it's likely that any other way of constructing the same result set will also fail. The list of restrictions for indexed views weren't just selected at random - they relate to how the indexing is implemented behind the scenes. — Damien_The_Unbeliever, Mar 12 '12 at 15:47
Alternate: Redesign - having seen what you're aiming for. If you arrange that the latest revision of the row always has a NULL Revision value, then you can create an indexed view based on *that*. Further, you can enforce the constraint that there's only one non-NULL revision for each company. Of course, inserts become problematic (you have to assign a revision in an existing row when inserting new) so you'd have to switch to `MERGE` (and accept that revision will be assigned later than the insert) — Damien_The_Unbeliever, Mar 12 '12 at 19:04
(I meant NULL when I said non-NULL: You can enforce only one NULL revision per company) — Damien_The_Unbeliever, Mar 12 '12 at 19:10

Aaron Bertrand · Accepted Answer · 2012-03-12T15:55:32.260

5

Instead of creating an exclusionary view, why not try the other way:

CREATE VIEW dbo.HighestCompany
AS
  SELECT t1.CompanyID, t1.Company, t1.Revision, s.StatusName
    FROM dbo.Companies AS t1
    INNER JOIN (
      SELECT Company, HighestCompany = MAX(CompanyID) 
      FROM dbo.Companies GROUP BY Company
    ) AS t2
    ON t1.Company = t2.Company
    AND t1.CompanyID = t2.HighestCompany -- not sure if CompanyID is unique
    INNER JOIN dbo.Statuses AS s
    ON s.StatusID = t1.StatusID;

You still won't be able to create an indexed view on this, but it may work a little better than the versions you currently have (depending on several factors, of course, including index on Company and selectivity).

Aside from that, I think to improve performance you'll need to take a look at the indexing strategy on the base tables. Why does your Companies table allow multiple companies with the same name and a different ID? Maybe that is part of the problem, and you should just store the currently relevant company in a separate table.

You could do this as follows (keep in mind I am guessing at data types and optimal indexes here):

CREATE SCHEMA hold AUTHORIZATION dbo;
GO
CREATE SCHEMA cache AUTHORIZATION dbo;
GO
CREATE TABLE dbo.HighestCompany
(
  CompanyID INT, 
  Company NVARCHAR(255) PRIMARY KEY,
  Revision INT,
  StatusName NVARCHAR(64)
);
GO
CREATE TABLE cache.HighestCompany
(
  CompanyID INT, 
  Company NVARCHAR(255) PRIMARY KEY,
  Revision INT,
  StatusName NVARCHAR(64)
);
GO

Now however often you think this data needs to be refreshed, you can run a job that does the following:

TRUNCATE TABLE cache.HighestCompany;

INSERT cache.HighestCompany(CompanyID, Company, Revision, StatusName)
SELECT t1.CompanyID, t1.Company, t1.Revision, s.StatusName
        FROM dbo.Companies AS t1
        INNER JOIN (
          SELECT Company, HighestCompany = MAX(CompanyID) 
          FROM dbo.Companies GROUP BY Company
        ) AS t2
        ON t1.Company = t2.Company
        AND t1.CompanyID = t2.HighestCompany
        INNER JOIN dbo.Statuses AS s
        ON s.StatusID = t1.StatusID;

-- this is a fast, metadata operation that should result
-- in minimal blocking and disruption to end users:
BEGIN TRANSACTION;
  ALTER SCHEMA hold TRANSFER dbo.HighestCompany;
  ALTER SCHEMA dbo TRANSFER cache.HighestCompany;
  ALTER SCHEME cache TRANSFER hold.HighestCompany;
COMMIT TRANSACTION;

If you find the companies change so often, or the data really needs to be up-to-the-second, that this isn't practical, you could do something similar with a trigger as @Dems suggested.

edited Mar 12 '12 at 15:55

answered Mar 12 '12 at 15:31

Aaron Bertrand

272,866
37
466
490

@Dems my reading of the query says "out of all the companies that match by company (which I assume is name), exclude all the ones that don't have the highest companyID" ... now I am not sure exactly what data is in company vs. companyID, as they are pretty ambiguous names, but I believe my view will return the same rows as the OP's view. The optimizer may not treat it any differently, but it could be worth a try. (It's also something that you could run periodically and cache the results somewhere.) – Aaron Bertrand Mar 12 '12 at 15:40
'CompanyID' is just the Identity column. 'Company' is a number unique per company, which will never change even if other details of the company changes (in columns originally not stated). We have a requirement that there be no 'DELETE' or 'UPDATE' privileges on the base tables. That's where the 'Revision' column comes into play. For every change, 'Revision' increments. There are a few extra columns in Companies (CompanyName, Address, etc). Other base tables have more columns and are updated more frequently, this was just the simplest table/view for the example. – njb Mar 12 '12 at 16:07
`CompanyID`, `Company`, and `Revision` are all `INT`. `CompanyName`, etc are `NVARCHAR(255)`. – njb Mar 12 '12 at 16:19
@njb - unfortunately, there's no way to make SQL Server aware that you're operating under this no DELETE/UPDATE "restriction" (which of course cannot be enforced since the `sa` *could* perform such operations). A lot of the indexed view restrictions wouldn't exist if the system only had to support INSERT on the base tables. – Damien_The_Unbeliever Mar 12 '12 at 17:37
@Damien - It would be nice, but I figure it's not a common request... Yes, there is always _that_, but our application which interfaces with the database uses a different user than `sa`. – njb Mar 12 '12 at 18:32
1

@njb - I'd probably go for a trigger then, if updates must be immediate, or Aaron's solution involving the cache if not. One things certain, you won't be able to force/fool the DB Engine into creating a genuine indexed view for this. – Damien_The_Unbeliever Mar 12 '12 at 18:40
@Damien - Yes there's some good suggestions to try. I was hoping a simple rewrite of the View would work, but once that had been nixed, a redesign brought in a flood of extra requirements. I will also try your redesign, as it's a neat workaround for the OP, but we still have to bring up "old" records for occasional reporting. Those can be slower queries, but they need to have a "proper" `Revision` returned. It can't be `NULL`. Also, in this case, would `MERGE` technically be updating the `Companies` table? – njb Mar 12 '12 at 19:34
After profiling, this is the fastest query; about 20 times faster than my first option. It's also about 33% faster than @Dems and my second option. I never tested my second option for performance, only to see if I could create an Indexed View. Performance appears good enough for a large test database with this option, but if further optimizations are required, I'll use the trigger option. Thanks for all the suggestions. – njb Mar 15 '12 at 14:14

MatBailie · Answer 2 · 2012-03-12T15:58:13.590

1

Unfortunately, you can't.

Not only does your query necessitate a LEFT JOIN, you're LEFT JOINing the same table on to itself. And to quote BooksOnline and your question...

The SELECT statement in the view cannot contain the following Transact-SQL syntax elements: 
 - Outer or self joins.

An alternative option may be to create a real mapping table that you keep updated via a trigger. Records being created/deleted as Companies changes, and records being updated as Statuses changes.

Equally, views are expanded in-line into the queries in which they are used (unless you specifically say otherwise with the NOEXPAND hint). Have you checked the execution plans of your queries to see if you could create more appropriate indexes on the base tables?

EDIT

An alternative query layout, just as an option...

;WITH
  sequenced_companies
AS
(
  SELECT
    ROW_NUMBER() OVER (PARTITION BY Company ORDER BY CompanyID DESC) AS sequence_id,
    *
  FROM
    dbo.companies
)
SELECT
  *
FROM
  sequenced_companies
INNER JOIN
  dbo.statuses
    ON statuses.StatusID = sequenced_companies.StatusID
WHERE
  sequenced_companies.sequence_id = 1

With an index on (Company, CompanyID DESC), this should be quite quick. (Although still not suitable for an indexable view.)

edited Mar 12 '12 at 15:58

answered Mar 12 '12 at 15:30

MatBailie

83,401
18
103
137

Didn't think it was possible, but it's worth asking to learn something new. Interesting approach. It looks like I'll have to have either extra tables or extra views, so I'll try both suggested answers so far. Haven't checked execution plans yet, but I believe indexes are ok. For `Companies`, `CompanyID` is Identity, PK, indexed. Will look into that to see if anything stands out. – njb Mar 12 '12 at 15:52
Can you explain why you have the same Company multiple times with different CompanyID values? Seems to me the Company rather than the surrogate identity value should be the key. – Aaron Bertrand Mar 12 '12 at 15:57
@njb - An alternative query, and suggested index for `companies`, added in case it helps. – MatBailie Mar 12 '12 at 15:59
@AaronBertrand - We have some harsh restrictions on our base tables. See my comment to your suggested answer. – njb Mar 12 '12 at 16:13
Yes, I read about the restrictions, but I don't understand *WHY*. If the older versions of a row or not (or less) relevant, move them to a different table. Not sure why you would want a requirement like "let's keep every version of a company around forever, in this table that we need for real-time OLTP activities." – Aaron Bertrand Mar 12 '12 at 16:20
@AaronBertrand - For auditing purposes of our customers. We have used a similar "archiving" strategy before, but they've never been happy with that. It's been increasingly more difficult to convince them and new customers. Taking away those privileges makes everyone happy. Except me. =) – njb Mar 12 '12 at 16:36
I understand a view cannot have a self join but does anyone know the basis for that restriction? – paparazzo Mar 12 '12 at 16:44

Cannot create CLUSTERED INDEX on a View due to LEFT JOIN or subquery

2 Answers2