Using COLLECT STATISTICS in Teradata

Question

In Teradata I can use a statement like ...

collect statistics on my_table column(col1)

This will gather stats on the table and store them in DBC views like ColumnStats, IndexStats and MultiColumnStats. I'm also under the impression that the optimizer (parsing engine) will find the statistics when they are available and use them instead of estimated table cardinality/index value counts to make better decisions on how to execute a query.

This all sounds great, but I have some questions.

are there any disadvantages to using collect stats?
When is it appropriate/inappropriate to use collect statistics in your SQL scripting?
What's the performance benefit to collect statistics on a field that's already indexed?
How long are statistics stored for (table, volatile tables)?
Any other comments concerning collect statistics would be appreciated.

Sorry but IMO this question is not a "good fit" for SO. Collecting statistics is a very important, perhaps essential part of Teradata and there are many online articles that discuss the subject. Also, you have too many different parts to this question to be answered clearly. Any one of the bullets might be worth asking again. Voting to close as "not constructive". — BellevueBob, May 22 '13 at 00:54
Hey Bob do you think it'd be better suited for migrating the question to the Database Administrators SO site rather than voting "not constructive"? I've found articles but none really address my question(s) — ChrisCamp, May 22 '13 at 01:32

score 16 · Accepted Answer · answered May 28 '13 at 19:28

1>are there any disadvantages to using collect stats?

Yes, collect stats itself is time consuming, it actually locate data from AMPS and insert the stats in dictionary tables.

Suppose you have a table definition like:

ct t1(x1 int,y1 int, z1 int);

The table contains millions of rows and z1 is never used in the ST/Join conditions, then it is not worth to collect stats on z1.

2>When is it appropriate/inappropriate to use collect statistics in your SQL scripting?

Already answered above. If a column is going to be used as ST/Join condition .i.e in where or on clause, you must collect stats, otherwise not needed.

3>What's the performance benefit to collect statistics on a field that's already indexed?

ct t1(x1 int,y1 int) primary index(x1);

for a simple query like sel * from t1 where x1 = 5;

will demonstrate the usefulness of collect stats.

How?

the optimizer can correctly estimate how many rows this query will select and if t1 is going to be joined with say t2, a efficient join will be chosen by optimizer.

4>How long are statistics stored for (table, volatile tables)?

Table : permanently.

volatile tables: till session expires.

5>Any other comments concerning collect statistics would be appreciated.

Nothing has been discussed about multicolumn stats.

Say, the query is like:

sel * from t1 join t2 on y1=y2 and x1=2;

then collecting multi-column stats on (x1,y1) would be quite helpful in optimization.

Also, if table demography has been changed (increased number of rows) you must consider re-collecting the stats

Hey there user, i appreciate the thought out response – ChrisCamp May 28 '13 at 20:21 — ChrisCamp, May 28 '13 at 20:21

Using COLLECT STATISTICS in Teradata

1 Answers1