6

Reading about clustered column store index in SQL Server 2014, I'm wondering if having a table with a huge number of columns is still an anti-pattern. Currently to alleviate the problem of having a single table with lots of columns I'm using vertical partitioning but having clustered column store index available this shouldn't be needed. Is this correct or am I missing something?

Example: Lets take for example the log of performance counters, the raw data might have the following structure:

╔══════════════════╦═══════╦═══════╦═════╦═════╦═════╦══════════╗
║       Time       ║ Perf1 ║ Perf2 ║ ... ║ ... ║ ... ║ Perf1000 ║
╠══════════════════╬═══════╬═══════╬═════╬═════╬═════╬══════════╣
║ 2013-11-05 00:01 ║     1 ║     5 ║     ║     ║     ║        9 ║
║ 2013-11-05 00:01 ║     2 ║     9 ║     ║     ║     ║        9 ║
║ 2013-11-05 00:01 ║     3 ║     2 ║     ║     ║     ║        9 ║
║ 2013-11-05 00:01 ║     4 ║     3 ║     ║     ║     ║        9 ║
╚══════════════════╩═══════╩═══════╩═════╩═════╩═════╩══════════╝

Having such a table with 1000 columns is evil, because one row will most likely span more than one page, because usually it is unlikely that one will be interested in all measures but the query will always incur in the IO cost, etc.. etc.. To solve this vertical partitioning usually helps, for example one could partition performance counters in different tables by category (CPU, RAM, etc.).

Conversely having such a table as a clustered column store index should not be such a problem because data will be stored column-wise and the IO involved for every query will be about only the requested columns, nothing more regardless of the total number of columns in the table.

marcob
  • 1,003
  • 2
  • 11
  • 23
  • Sure, that sounds reasonable based on [this](http://msdn.microsoft.com/en-us/library/gg492088(v=sql.120).aspx), but it's probably one of those questions that can only be answered by direct experiment. I'm more concerned that we appear to lose anything resembling a PK or unique index since the clustered columnstore index `[i]s the only index on the table. It cannot be combined with any other indexes` – criticalfix Nov 04 '13 at 15:57
  • 1
    One (perhaps minor) downside is that building it may require more memory [How much memory is needed to create a columnstore index?](http://social.technet.microsoft.com/wiki/contents/articles/3540.sql-server-columnstore-index-faq.aspx#CreateColumnstore) – Martin Smith Feb 23 '14 at 17:09

2 Answers2

1

It's certainly less "bad" than horizontal store but 1000 is pushing the limit a bit too far. Our data warehouse usually have tables with 100 - 200 columns and they zippy enough with column store index. Assuming you have perfect column store index, each query should only look at specific vertical index and hence very efficient. But if your column store indexes are not optimal for the query, SQL Server has to do some jumping between the indices and those are not good.

There's no rule of thumb about this. You will have to benchmark to answer this question in your specific environment.

Code Different
  • 90,614
  • 16
  • 144
  • 163
  • why is 1000 too much compared to 100-200? considering the storage structure it shouldn't matter.Anyway I do not actually have 1000 columns, my question was in general about the technology, I just want to understand if I am missing something. – marcob Nov 04 '13 at 21:25
  • First of all, the maximum row size is limited to 8096 bytes per row for fixed-length data types. If your data is variable length (varchar, blob, etc.) it can be broken into separate rows (see [this topic](http://technet.microsoft.com/en-us/library/ms143432.aspx) on MSDN). Second If you have any kind of row-based index, it become extremely time consuming to maintain. Think about find a need in a haystack. Third, you need to think really carefully about your column store indexes. If you query two columns in two different indexes, the performance will be slow. – Code Different Nov 04 '13 at 23:22
  • I don't know your exact environment setup so can't offer any specific here. Why don't you benchmark the 1000-column table vs. 2 tables of 500 each? – Code Different Nov 04 '13 at 23:25
-1

The type of queries in your workload and the type of data in your table are factors that determine whether rowstore or columnstore would give you better benefits. If the queries are looking up a small set of rows, rowstore may provide better performance. If the queries are data warehouse type of queries, example - scanning of large amount of data, columnstore would provide better performance. Also, you could create nonclustered columnstore index on your table. The query optimizer will decide when to use the columnstore index and when to use other indexes.

I recommend reading the TechNet article containing list of FAQ for columnstore index here.

Jamal
  • 763
  • 7
  • 22
  • 32