21

In the Snowflake documentation, I could not find a reference to using Indexes.

Does Snowflake support Indexes and, if not, what is the alternative approach to performance tuning when using Snowflake?

Nathan Griffiths
  • 12,277
  • 2
  • 34
  • 51
Koustav Ponda
  • 219
  • 1
  • 2
  • 3

12 Answers12

34

Snowflake does not use indexes. This is one of the things that makes Snowflake scale so well for arbitrary queries. Instead, Snowflake calculates statistics about columns and records in files that you load, and uses those statistics to figure out what parts of what tables/records to actually load to execute a query. It also uses a columnar store file format, that lets it only read the parts of the table that contain the fields (columns) you actually use, and thus cut down on I/O on columns that you don't use in the query.

Snowflake slices big tables (gigabyte, terabyte or larger) into smaller "micro partitions." For each micro partition, it collects statistics about what value ranges each column contains. Then, it only loads micro partitions that contain values in the range needed by your query. As an example, let's say you have a column of time stamps. If your query asks for data between June 1 and July 1, then partitions that do not contain any data in this range, will not be loaded or processed, based on the statistics stored for dates in the micropartition files.

Indexes are often used for online transaction processing, because they accelerate workflows when you work with one or a few records, but when you run analytics queries on large datasets, you almost always work with large subsets of each table in your joins and aggregates. The storage mechanism, with automatic statistics, automatically accelerates such large queries, with no need for you to specify an index, or tune any kind of parameters.

Jon Watte
  • 6,579
  • 4
  • 53
  • 63
  • 9
    Every database i know collects and uses statistics. – ynux Jan 09 '20 at 16:44
  • 2
    yes, that is true -- snowflake just takes it to the extreme that they get better throughput (and thus analytic query performance) using all their statistics, than regular databases get with indexes + statistics. – Jon Watte Jan 10 '20 at 22:50
  • 4
    Say you have a table with 2 indices, one optimized for one type of access, say, select column A, the other for column B. This is possible because the indices are physically stored to do exactly this. And micro partitions - magically do everything for every way the data is accessed? You know, when everything is an advantage, even features that aren't there, it doesn't help me to understand what the product really does. – ynux Jan 15 '20 at 16:26
  • 2
    I don't understand the concern. Indexes have significant storage and update cost, but they work great for point queries in online systems. Point queries aren't that great in Snowflake -- they're not BAD, but they're not the most-optimized case. If you want to do point queries with low latency, Snowflake is going to be much worse than, say, MySQL or DB/2. But still much better than, say, Hive/Hadoop. – Jon Watte Jan 16 '20 at 18:08
  • @JonWatte Having run point queries on TB sized datasets with over 100 billion rows in Snowflake, they are actually quite decent – Radagast Feb 10 '21 at 15:47
  • For analytics, their point queries are fine! My point was more along the lines of "Snowflake won't currently compete with your typical online transactional processing database." The start-up cost for a query, even if you have a warehouse already spun up, is somewhat high, both because of how their compiler works, and because the data at rest lives in S3, which has its own access cost. – Jon Watte Feb 11 '21 at 18:11
  • 1
    So, it does not have indexes...it is an index. Got it. – Sean Anderson Jan 27 '22 at 15:24
24

Snowflake does not support indexes, though it does support "clustering" for performance improvements of I/O.

I recommend reading these links to get familiar with this:

https://docs.snowflake.net/manuals/user-guide/tables-clustering-keys.html

https://docs.snowflake.net/manuals/user-guide/tables-auto-reclustering.html

Here's a really good blog post on the topic as well: https://www.snowflake.com/blog/automatic-query-optimization-no-tuning/

Hope this helps...Rich

Rich Murnane
  • 2,697
  • 1
  • 11
  • 23
7

No Snowflake does not have indexes. Its performance boosts come through by eliminating unnecessary scanning which it achieves my maintaining rich metadata in each of its micro partitions. For instance if you have a time filter in your query and your table is more or less sorted by time, then Snowflake can "prune" away the parts of the table that are not relevant to the query.

Having said this, Snowflake is constantly releasing new features and one such feature is its Search Optimisation Service which allows you to perform "needle in a hay stack" queries on selected columns that you enable. Not quite indexes that you can create, but something like that being used behind the scenes perhaps.

Krish
  • 71
  • 1
  • 1
4

No, Snowflake doesn't support indexes. And don't let them tell you that this is an advantage. Performance tuning can be done as described above, but is often is done with money: Pay for bigger warehouses.

ynux
  • 1,280
  • 14
  • 21
2

Snowflake doesn't support indexes, it keeps data in micro partition or in another sense it breaks data sets in small files and format rows to column and compress them. Snowflake metadata manager in service layer will have all the information about each micro partition like which partition have which data. Each partition will have information about itself in header like max value, min value, cardinality etc. this is much better then indexes as compare to conventional databases.

Tyagi.Am
  • 31
  • 1
1

** Updated Fall 2022 - thanks to Hobo's comment: Yes, via Unistore's Hybrid Tables. **

Original Response:

Neither Snowflake nor any high-performance big data / OLAP system will support [unique] indexes because these systems are MPP (Massively Parallel Processing). MPP systems load data with thousands of concurrent inserts into the same table. [Unique] Indexes are a concept from much smaller / OLTP systems. Even then many data engineers intentionally disable the [unique] indexes on OLTP systems when they approach big data scale especially as the data is inserted or frequently updated and deleted.

If you want a "non-unique index" then you can use a slew of features such as: micro-partitions, clustered tables, auto-clustering, Search Optimization Service, etc.

This Medium can give you some workarounds. How can we enforce [Unique, Primary Key, Foreign Key (UPF)] column constraints in Snowflake?

Allen
  • 406
  • 2
  • 8
  • 1
    That medium link has been updated to point out there's a new (as of June 2022) `hybrid` table type that does enforce uniqueness in primary keys: https://www.snowflake.com/blog/introducing-unistore/ – Hobo Oct 03 '22 at 00:31
1

Snowflake is a columnar database with automatic micro-partitioning. Note that in SQL Server, Microsoft call their columnar storage option a column store index.

The performance gain from columnar storage on data warehouse/mart type queries is spectacular compared with their row store brethren. By storing data by column the columns can be greatly compressed allowing a huge amount of data can be held in memory.

If your predominant queries are on a naturally ordered column, such as OrderDate then it makes sense to cluster on OrderDate. You will gain a performance benefit from doing that.

Clustering isn't a catch-all performance boost. Choose your clustering unwisely and you can degrade performance for your queries.

In terms of performance tuning there are techniques you can use.

When using a dimensional model look at the most commonly used aspects of those dimensions and look to denormalise those aspects into your fact tables to reduce the number of joins.

For example, if the queries use Week, Month and Quarter then denormalise those aspects into the fact table giving you performance concerns. The affect on storage in a column store DB is far less than in a row store DB so the cost/benefit balance is much better.

Materialised views are another way of performance tuning however these come with caveats.

  • The range of SQL statements available to you for materialised views is far less than for other views
  • Not all aggregates are supported
  • Can only be on a single table

They work well when data doesn't change often.

If your underlying table is clustered on OrderDate then a materialised view of last months orders might not give you the desired performance benefit because partition pruning might already be doing what is needed.

If your query performance is as a result of contention with other users then spinning up another warehouse might be the answer. 2 warehouses dedicated to their tasks might be more cost effective than scaling up a single warehouse.

Primary/unique key constraints can be defined but are metadata only despite the constraint documentation describing the enforced/not enforced syntax.

Some distributed column stores do support PK and FK constraints, Vertica being an example, but most do not because the performance impact of enforcing them is too high.

1

Snowflake's Search Optimization Service will create indexes over all the pertinent columns in a table "out of the box" as well as other advances search features (e.g. substring and regex matching).

If you'd like optimize for specific expressions used in your queries, you can customize SOS, as well.

Anyone that has tried to scale an OLTP database knows that traditional b-tree indexes used in SQL databases do not scale. Massive contention and so they are unsuitable for MPP systems. Snowflake SOS is a non-traditional index structure based on bloom filter maps that ultimately provide partition pruning. More like how Google used to "index" web pages years ago. So yes, technically Snowflake supports indexes, but not in the way most database folks would understand or appreciate.

1

Yes, Snowflake does support indexes. It just doesn't call them indexes.

You can use both:

Limonka
  • 656
  • 3
  • 18
0

Snowflake does not support indexing natively, but it has other ways to tune performance:

  1. Reduce queuing by setting a time-out and/or adjusting the max concurrency
  2. Use result caching
  3. Tackle disk spilling
  4. Rectify row expansion by using the distinct clause, using temporary tables and checking your join order
  5. Fix inadequate pruning by setting up data clustering

Reference: https://rockset.com/blog/what-do-i-do-when-my-snowflake-query-is-slow-part-2-solutions/ (Disclosure: I work for Rockset).

cigien
  • 57,834
  • 11
  • 73
  • 112
  • When linking to your own site or content (or content that you are affiliated with), you [must disclose your affiliation _in the answer_](/help/promotion) in order for it not to be considered spam. Having the same text in your username as the URL or mentioning it in your profile is not considered sufficient disclosure under Stack Exchange policy. – cigien May 19 '22 at 02:37
0

In short, snowflake does not support indexes but a single clustering key on a each table.

Nikhil S
  • 3,786
  • 4
  • 18
  • 32
0

Snowflake does not support indexes but if you are looking for optimization you can use search optimization service of Snowflake. Please refer below snowflake documentation. https://docs.snowflake.com/en/user-guide/search-optimization-service.html