3

In Teradata, I create table with unique primary key out of two varchar columns A and B. I will write queries that need to filter on one or both of these columns.

For best performance, should I submit a create index statement for each of the two columns (the table would have 3 indexes: the unique primary key(column A, B), non-unique column A, and non-unique column B)?

On this table, I only care about read performance and not insert/update performance.

Insu Q
  • 403
  • 6
  • 13
  • Yes, for best performance you will need three keys. A composite key will not help for queries on the individual columns, but it will still be needed if forms the primary key for the table. – daShier Sep 05 '19 at 22:50
  • Thanks. I saw posts where people said the first column (A) serves as a covering index and that it’s not necessary to create a separate index for it since the primary key uses column A. But for column B I would have to create a separate index. The posts I saw were about MySQL though. When you say I need 3 indexes, is this because Teradata handles indexes differently than MySQL? – Insu Q Sep 05 '19 at 22:55
  • I am not aware of a feature of MySQL that would eliminate the need for three indexes. However, even if it's the case that the index for column `a` is created implicitly, it's still created, so doing so explicitly would not seem to have any negative impact. – daShier Sep 05 '19 at 23:04
  • 2
    Not sure about Teradata but in most other RDBMS, a composite key will help for searches on the leading columns but not the trailing columns. An index on A,B,C helps for searches on A, A,B and A,B,C. It doesn't help for searches on C or B alone. So consider your leading columns carefully – Nick.Mc Sep 05 '19 at 23:13
  • 1
    @Nick.McDermaid: Teradata's *Primary Index* is hash-based, *all* columns must be specified (in fact it's for data distribution in a massive parallel system, similar to `partition by hash`) – dnoeth Sep 06 '19 at 07:09
  • 1
    @InsuQ: How is this table accessed? A PI can only be used for *equality*, not for `between`, etc. Consider a PI on a single column (depending on access and distribution) and a Secondary Index or Join Index on the other. – dnoeth Sep 06 '19 at 07:12

2 Answers2

1

In Teradata, if you specify a PRIMARY KEY clause when you create the table, then the table will automatically be created with a UNIQUE PRIMARY INDEX (UPI) on those PK columns. Although Teradata supports keys, it is more of an index-based DBMS.

In your case, you will have very, very fast reads (i.e. UPI access - single AMP, single row) only when you specify all of the fields in your PK. This applies to equality access as mentioned in the previous comments (thanks Dieter).

If you access the table on some but not ALL of the PK / UPI columns, then your query won't use the UPI access path. You'd need to define separate indexes or other optimization strategies, depending on your queries.

If you only care about read performance, then it makes sense to create secondary indexes on the separate columns. Just run the EXPLAIN on your query to make sure the indexes are actually being used by the Optimizer.

Another option is to ditch the PK specification altogether, especially if you never access the table on that group of columns. If there is one column you access more than the other, specify that one as your PRIMARY INDEX (non-unique) and create a secondary index on the other one. Something like:

CREATE TABLE mytable (
  A INTEGER,
  B INTEGER,
  C VARCHAR(10)
)
PRIMARY INDEX(A) -- Non-unique primary index
;

CREATE INDEX (B) ON mytable; -- Create secondary index
ravioli
  • 3,749
  • 3
  • 14
  • 28
0

You only need two indexes.

If you have a primary key on (A, B), then this also works for (A). If you want to filter on B, then you want an index on (B).

You might want to make it (B, A) so the index can handle cases such as:

where B = ? and A in (?, ?, ?)
Andre
  • 26,751
  • 7
  • 36
  • 80
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    I don't think this is quite right. In Teradata, if you specify a `PK` on `(A, B)` and you try to access the table through only column `A` or only column `B`, it won't give you an indexed access. You'd have to define a separate index on `A` and another one on `B`. Or maybe I misunderstood your answer. – ravioli Sep 06 '19 at 06:16
  • 1
    Teradata's Primary Index is hash-based, thus all columns must be specified using equality. – dnoeth Sep 06 '19 at 07:14
  • @dnoeth . . . Even a hash-based index could be used for `in` (it is just multiple index lookups). Teradata doesn't do that optimization? – Gordon Linoff Sep 06 '19 at 10:38
  • Of course, IN is just also based on equality, but you wrote *primary key on (A, B), then this also works for (A)* – dnoeth Sep 06 '19 at 11:22