3

Profile of the foreign keys

FK      Distinct Values      %
----    ---------------  ------
Id1     1                 0.1%
,Id2    4                 0.3%
,Id3    5                 0.3%
,Id4    6                 0.4%
,Id5    6                 0.4%
,Id6    95                6.1%
,Id7    97                6.2%
,Id8    1423             90.7%

All foreign keys already make up the clustered Primary Key. This fact table is part of a star schema that includes 6 dimensions (Id's 6,7, and 8 reference the same date dimension).

Fact table currently has approx 1800 rows (incredibly small), and is expected to grow by that amount each month.

Should each foreign key have its own non-clustered non-unique single column index for facilitating joins? If so, why?

Each foreign key will be part of a clustered index (primary key) in its dimension table.

If indexes should be put on the foreign keys, then what should the fill factor and padding index be set to given the low cardinality of the columns?

Community
  • 1
  • 1
Adrian Torrie
  • 2,795
  • 3
  • 40
  • 61

2 Answers2

2

First of all, I think that you should not create a clustered primary key based on foreign keys. A clustered index is organizes the data on disk and it is better if it is

  • narrow
  • numeric
  • increasing (strictly monotonic)

So I think it is better to create e.g. a unique constraint on foreign keys to make row unique. Or create a non clustered primary key on those columns and then create a clustered index (but not primary key) on e.g. date foreign key (YYYYMMDD).

Usually foreign keys are indexed (non clustered, non unique) on Fact table to make faster searches. But some people does not enforce cardinality at all on dimensional model (ETL takes care of the referential integrity) because primary key - foreign key constraints makes ETL loads slow.

From Vincent Rainardi

  1. Question: How do you index a fact table? And explain why. {H}

Answer: Index all the dim key columns, individually, non clustered (SQL Server) or bitmap (Oracle). The dim key columns are used to join to the dimension tables, so if they are indexed the join will be faster. An exceptional candidate will suggest 3 additional things: a) index the fact key separately, b) consider creating a covering index in the right order on the combination of dim keys, and c) if the fact table is partitioned the partitioning key must be included in all indexes.

jrara
  • 16,239
  • 33
  • 89
  • 120
  • What do you mean by "the ETL takes care of cardinality?". I've never heard of "enforcing cardinality", you don't mean referential integrity do you?? Using a date column as a single column primary key in a large table makes sense due to the advantages of partitioning and the ability to switch partitions in/out during an ETL, however this is a small table so I'm not sure there would be any advantages to doing it solely on a single column in this instance? – Adrian Torrie Nov 27 '12 at 23:10
  • Yes, I meant referential integrity, I updated the answer. Your fact table is so small that you don't have to care about the indexing yet, as @N West wrote. – jrara Nov 28 '12 at 06:31
2

Your profile doesn't really make sense with the "%" column - why are you finding the "percentage" of distinct values across fields? You need stats on the distribution of the distinct values - are 99% of the keys on Id8 the same ? are they evenly distributed? etc.

Note that everything I'm saying here applies to larger tables. With 1800 rows / month, indexes are probably a waste of space and time for you to worry about.

@jrara's "rule" about indexing all the dims is an easy rule to apply, but you can easily make mistakes if that's all you do. I don't want to use an oracle bitmap index on my 100mil row customer dimension, for example.

Indexing depends on what the queries look like against your data. Indexes won't help if you are doing a full scan of the fact table to perform aggregation and grouping for "summary" reports. They will help when a user is trying to filter on an attribute of a dimension, and that filter results in you only having to look up a small percentage of the records from the fact table. Is there a main entry point to your table? Do people typically filter on an attribute of the "Id8" dimension, then want grouping on an attribute from the other dimensions?

Essentially the answers to your questions are:

Should each foreign key have its own non-clustered non-unique single column index for facilitating joins?

In general, yes, so long as the dimension tables are small and the dim keys are relatively evenly distributed in the fact table. Usually it is worse to use a index access to get 99% of the fact table rows.

what should the fill factor and padding index be set to given the low cardinality of the columns?

Lowering the FILLFACTOR below 100% will cause slower index reads, since there are more (empty) pages in the index for the DB to read. Since a data warehouse is designed for fast selects, I don't really ever recommend that you adjust the fillfactor down.

That being said, in a few cases adjusting your FILLFACTOR may make sense. If the the fact table is very large (hundreds of GB / TB), and index rebuilds take hours, and you might only rebuild indexes once a month or even less. In these cases you need to figure out how much data (as a percentage) that you'll be adding to the table each day, and set the fillfactor accordingly.

N West
  • 6,768
  • 25
  • 40
  • Also - IOT (tables with a clustered index) doesn't really make much sense for a fact table. I can see a few cases where you might want it if you put surrogate keys on your fact (or keep the business keys around if you "update" your fact) – N West Nov 26 '12 at 14:12
  • Querying will carry out filtering on the basis of dimension attributes, so this would suggest that an index on the foreign keys may be a good idea in this instance. **Id8** is the date column, **Id7** is a contract number (to an account dimension), **Id6** is related to the contract number, all other Ids are business keys. All or none of the Ids may be used for querying. Given that I'm suggesting a composite `PRIMARY KEY`, wouldn't a fill-factor be necessary to prevent fragmentation of the keys, as the key fields will not be incremental in nature (much like a single column date key would be)? – Adrian Torrie Nov 27 '12 at 23:16
  • 1
    Ah, I wasn't clear that your question was about a fillfactor on your clustered key, I thought it was about your non-clustered indexes. This is why I don't recommend a clustered index on a fact table. If it's on *all* of your keys, then you'll need a pretty low fill factor since any key can go anywhere in the tree. It's really not a good idea to put a bunch of keys into a clustered index... – N West Nov 28 '12 at 13:48