11

I have a non-clustered columnstore index on all columns a 40m record non-memory optimized table on SQL Server 2016 Enterprise Edition.

A query forcing the use of the columnstore index will perform significantly faster but the optimizer continues to choose to use the clustered index and other non-clustered indexes. I have lots of available RAM and am using appropriate queries against a dimensional model.

Why won't the optimizer choose the columnstoreindex? And how can I encourage its use (without using a hint)?

Here is a sample query not using columnstore:

SELECT
  COUNT(*),
  SUM(TradeTurnover),
  SUM(TradeVolume)
FROM DWH.FactEquityTrade e
--with (INDEX(FactEquityTradeNonClusteredColumnStoreIndex))
JOIN DWH.DimDate d
  ON e.TradeDateId = d.DateId
 JOIN DWH.DimInstrument i
  ON i.instrumentid = e.instrumentid
WHERE d.DateId >= 20160201
AND i.instrumentid = 2

It takes 7 seconds without hint and a fraction of a second with the hint. The query plan without the hint is here. The query plan with the hint is here.

The create statement for the columnstore index is:

CREATE NONCLUSTERED COLUMNSTORE INDEX [FactEquityTradeNonClusteredColumnStoreIndex] ON [DWH].[FactEquityTrade]
(
    [EquityTradeID],
    [InstrumentID],
    [TradingSysTransNo],
    [TradeDateID],
    [TradeTimeID],
    [TradeTimestamp],
    [UTCTradeTimeStamp],
    [PublishDateID],
    [PublishTimeID],
    [PublishedDateTime],
    [UTCPublishedDateTime],
    [DelayedTradeYN],
    [EquityTradeJunkID],
    [BrokerID],
    [TraderID],
    [CurrencyID],
    [TradePrice],
    [BidPrice],
    [OfferPrice],
    [TradeVolume],
    [TradeTurnover],
    [TradeModificationTypeID],
    [InColumnStore],
    [TradeFileID],
    [BatchID],
    [CancelBatchID]
)
WHERE ([InColumnStore]=(1))
WITH (DROP_EXISTING = OFF, COMPRESSION_DELAY = 0) ON [PRIMARY]
GO

Update. Plan using Count(EquityTradeID) instead of Count(*) and with hint included

Rory
  • 959
  • 10
  • 22
  • 3
    You should provide the actual execution plan and/or DDL to get a meaningful answer. – Dan Guzman May 11 '17 at 12:22
  • add a query + plan – Rory May 11 '17 at 12:56
  • Try using paste the plan or something similar in the future. Makes it faster/easier for people to see it. https://www.brentozar.com/pastetheplan/?id=S1cM-JfxW – Chris Albert May 11 '17 at 13:04
  • Thanks. Very nice. I've updated the link. – Rory May 11 '17 at 13:25
  • I would suggest looking into Query Store, https://learn.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store – Jason May 14 '17 at 16:21
  • 1
    Please, provide the index DDL statement... – Sandr May 14 '17 at 23:01
  • @Rory:SQLServer is a cost based optimizer, if you look out plans, you will see both have different costs and sql will choose the one with less cost.At this point of time,this question is unclear,please add what Dan Guzman said – TheGameiswar May 15 '17 at 08:25
  • @Sandr: ddl statement included – Rory May 15 '17 at 13:14
  • @TheGameiswar: both plans now included – Rory May 15 '17 at 13:14
  • 1
    @Rory *Filtered* index? Then the optimizer has no way of knowing whether it can use the index or not, it has to query the underlying data *first*. What happens if you add `InColumnStore =1` to your query? – Panagiotis Kanavos May 15 '17 at 13:16
  • What are the table definitions? It's not possible to test your queries without the schema, one would have to guess at the column names and types – Panagiotis Kanavos May 15 '17 at 13:18
  • Instead of count(*--STAR) can you make count of rowid? Why you require count(*)? – Kannan Kandasamy May 15 '17 at 13:20
  • @KannanKandasamy a count is required, but a count with unique field name doesn't have a performance impact – Rory May 15 '17 at 13:37
  • I think it has, because your NonClustered index scan tries to convert all the columns to row-store, looks like due to that Index Scan is going to more cost can you check converting that to the id and show case the execution plan? – Kannan Kandasamy May 15 '17 at 13:40
  • @PanagiotisKanavos Adding InColumnstore will mean the columnstore index gets used but it won't always return the correct results. Re schema: 2 aggregated fields are [TradeVolume] [int] NOT NULL, [TradeTurnover] [numeric](19, 6) NOT NULL. All other referenced fields are int – Rory May 15 '17 at 13:44
  • @KannanKandasamy updated above – Rory May 15 '17 at 13:50
  • By any chance do you have IX_FactEquityTradeInColumnStore index also a non-clustered column store index? – Kannan Kandasamy May 15 '17 at 14:29
  • @KannanKandasamy that is just a regular nonclustered index – Rory May 15 '17 at 15:22
  • @Rory, if InColumnstore doesn't cover your needs, why is it in the filter for the columnstore index? – Steven Hibble May 16 '17 at 16:45
  • @StevenHibble the InColumnStore filter is for efficiency when loading data. For all tests so far the filter covers 100% of all rows – Rory May 16 '17 at 22:33

3 Answers3

5

You're asking SQL Server to choose a complicated query plan over a simple one. Note that when using the hint, SQL Server has to concatenate the columnstore index with a rowstore non-clustered index (IX_FactEquiteTradeInColumnStore). When using just the rowstore index, it can do a seek (I assume TradeDateId is the leading column on that index). It does still have to do a key lookup, but it's simpler.

I can see two options to get this behavior without a hint:

First, remove InColumnStore from the columnstore index definition and cover the entire table. That's what you're asking from the columnstore - to cover everything.

If that's not possible, you can use a UNION ALL to explicitly split the data:

WITH workaround
     AS (
         SELECT TradeDateId
              , instrumentid
              , TradeTurnover
              , TradeVolume
         FROM DWH.FactEquityTrade
         WHERE InColumnStore = 1
         UNION ALL
         SELECT TradeDateId
              , instrumentid
              , TradeTurnover
              , TradeVolume
         FROM DWH.FactEquityTrade
         WHERE InColumnStore = 0 -- Assuming this is a non-nullable BIT
        )
     SELECT COUNT(*)
          , SUM(TradeTurnover)
          , SUM(TradeVolume)
     FROM workaround e
          JOIN DWH.DimDate d
            ON e.TradeDateId = d.DateId
          JOIN DWH.DimInstrument i
            ON i.instrumentid = e.instrumentid
     WHERE d.DateId >= 20160201
           AND i.instrumentid = 2;
Steven Hibble
  • 533
  • 3
  • 9
  • 1
    Your union statement has answered the second part of the question. But for the first part isn't the optimizer cost-based as a commenter says above? – Rory May 20 '17 at 07:42
  • It *is* a cost-based model. But, it's just a model. It thinks that using the columnstore index AND rowstore non-clustered index is way too much work when it can just read the non-clustered index. In SSMS, put both versions of the query (i.e. the original query and the original + index hint) in the same window and get an estimated execution plan. You'll see that the estimated cost is higher for the one with the hint. – Steven Hibble May 22 '17 at 20:19
3

Your index is a filtered index (it has a WHERE predicate).

Optimizer would use such index only when the query's WHERE matches the index's WHERE. This is true for classic indexes and most likely true for columnstore indexes. There can be other limitations when optimizer would not use filtered index.

So, either add WHERE ([InColumnStore]=(1)) to your query, or remove it from the index definition.

You said in the comments: "the InColumnStore filter is for efficiency when loading data. For all tests so far the filter covers 100% of all rows". Does "all rows" here mean "all rows of the whole table" or just "all rows of the result set"? Anyway, most likely optimizer doesn't know that (even though it could have derived that from statistics), which means that the plan which uses such index has to explicitly do extra checks/lookups, which optimizer considers too expensive.

Here are few articles on this topic:

Why isn’t my filtered index being used? by Rob Farley

Optimizer Limitations with Filtered Indexes by Paul White.

An Unexpected Side-Effect of Adding a Filtered Index by Paul White.

How filtered indexes could be a more powerful feature by Aaron Bertrand, see the section Optimizer Limitations.

Vladimir Baranov
  • 31,799
  • 5
  • 53
  • 90
  • 2
    "Optimizer would use such index only when the query's WHERE matches the index's WHERE". This is not true with filtered columnstore indexes. I know because certain formulations of my query will trigger columnstore index usage without mentioning the InColumnStore field. (eg. using a cte) – Rory May 20 '17 at 07:33
-1

Try this one: Bridge your query

Select * 
Into #DimDate
From DWH.DimDate
WHERE DateId >= 20160201

Select  COUNT(1), SUM(TradeTurnover), SUM(TradeVolume)
From DWH.FactEquityTrade e
Inner Join DWH.DimInstrument i ON i.instrumentid = e.instrumentid 
     And i.instrumentid = 2
Left Join #DimDate d ON e.TradeDateId = d.DateId

How fast this query running ?

Asromi rOmi
  • 197
  • 1
  • 7