4

One of the benefits of a columnstore-index is that the data of a single column is stored "next to each other on disk". This stands for better compression and faster read times.

However; when using a B-tree (a regular non-clustered index), aren't the leafs of that tree the data itself? So when I make an index on column A, won't all the fields of column A appear sorted on disk in parallel to the table? This would mean the data could be compressed and could be read fast.

So in short: when exactly is a columnstore-index more performant than a B-tree on a single column?

Thanks in advance!

James A Mohler
  • 11,060
  • 15
  • 46
  • 72
Gurdt
  • 148
  • 9

1 Answers1

1

So when I make an index on column A, won't all the fields of column A appear sorted on disk in parallel to the table?

Yes, column A will be sorted. But all other columns won't. A column store index stores all columns in sorted order (in chunks of 2^20 rows).

usr
  • 168,620
  • 35
  • 240
  • 369
  • Ok. But when you select something from column A, in either case, all data of A is sorted. So can I assume that a columnstore-index is more efficient when querying other columns too? But what when I simply put a B-tree on these columns too? – Gurdt Mar 29 '12 at 07:16
  • When you need the data sorted, a B-Tree will not need a sort operation at all (if the columns you request are a prefix of the index). A column store cannot provide them sorted because it stores them stored per 2^20 rows. If you have multiple such segments the sort order cannot be provided. – usr Mar 29 '12 at 08:28
  • I think there might be some miscomprehension here. The purpose of a columnstore index is (or so I think), that there is less overhead in reading data from disk, because only relevant columns can be loaded. When loading from a row-store, one fetches rows in a whole, hence loading columns which may not be needed. The case now is the following: if I am interested in only one column, say A, how can a columnstore index be more performant, in contrast to a simple B-tree (nonclustered index). The leafs of a non-clustered index are in fact stored as a column, field after field, compact as possible. – Gurdt Mar 29 '12 at 08:43
  • On a single column, either one could be more efficient. If you retrieve everything at once, the column store will be more efficient because of compression and batch mode. If you are filtering on the column the B-tree becomes more efficient. – usr Mar 29 '12 at 11:20
  • This I can mark as an answer :) However, in my opinion there aren't that much distinct situations in which a columnstore index would be stunning. As you said, a B-tree is more efficient in the described situation. Only index-overhead issues would keep me from constructing a B-tree on the needed columns, or including them in another index (dependant of the situation and the query). – Gurdt Mar 29 '12 at 12:54
  • Don't underestimate the batching mode. I tried it myself, it is *fast*. Column stores really start to shine with multiple columns and many different queries. You can't replicate their performance with B-trees in this case. But this question is about single-column of course. – usr Mar 29 '12 at 18:18