Partition eliminations with multilevel partitions on teradata database

Question

I have a performance issue with multilevel partitioned tables on teradata database. It seems the partition elimination is not occurring if a table has a certain kind of partitioning structure.

As an example, consider the following table and query. With a single partitioning expression, the plan successfully eliminated all but one partition.

/* Table with single partitioning expression */
CREATE MULTISET TABLE table_single
     (
      id INTEGER,
      time_stamp INTEGER)
PRIMARY INDEX ( id ,time_stamp )
PARTITION BY RANGE_N(time_stamp  BETWEEN *,1651330800  AND 1656601199  EACH 3600 ,NO RANGE, UNKNOWN);

EXPLAIN SELECT count(*) FROM table_single WHERE time_stamp BETWEEN 1654041600 AND 1654041600 + 100;

// result
...
 3) We do an all-AMPs SUM step in TD_MAP1 to aggregate from 
    a single partition of table_single with a condition of
...

In contrast, when we add another partition, the plan became a full scan, which indicates partition elimination did not occur.

/* Example with multi level partition */
CREATE MULTISET TABLE table_mult
     (
      id INTEGER,
      time_stamp INTEGER)
PRIMARY INDEX ( id ,time_stamp )
PARTITION BY (
  RANGE_N(time_stamp  BETWEEN *,1651330800  AND 1656601199  EACH 3600 ,NO RANGE, UNKNOWN),
  RANGE_N(id  BETWEEN 1  AND 200  EACH 1 ,NO RANGE, UNKNOWN)
);

EXPLAIN SELECT count(*) FROM table_mult WHERE time_stamp BETWEEN 1654041600 AND 1654041600 + 100;

// result
...
  3) We do an all-AMPs SUM step in TD_MAP1 to aggregate from
     table_mult by way of an all-rows scan with a condition of
...

Further, when I change the partitioning expression slightly as below (removed the "*" from the first RANGE_N call), then the partition elimination is back.

/* Table with multilevel parition, v2 */
CREATE MULTISET TABLE table_mult2
     (
      id INTEGER,
      time_stamp INTEGER)
PRIMARY INDEX ( id ,time_stamp )
PARTITION BY (
  RANGE_N(time_stamp  BETWEEN 1651330800  AND 1656601199  EACH 3600 ,NO RANGE, UNKNOWN),
  RANGE_N(id  BETWEEN 1  AND 200  EACH 1 ,NO RANGE, UNKNOWN) 
);

EXPLAIN SELECT count(*) FROM table_mult2 WHERE time_stamp BETWEEN 1654041600 AND 1654041600 + 100;

// result
...
 3) We do an all-AMPs SUM step in TD_MAP1 to aggregate from 404
    partitions of table_mult2 with a condition of (
...

Not only the query plan changes, but also in practice we also observe that the query on the second table becomes slower as we have more data outside the target partitions, from which we believe the partition elimination is not functioning with that table.

We would like to understand the reasons for this behavior, in particular why the partition elimination does not occur for the second definition of the table.

I don't know why partition elimination doesn't work, but I wonder why you use `*`, there's already NO RANGE to catch the out-of-range rows? `time_stamp BETWEEN 0 AND 1651330799` instead of `*` works as expected, too. And the NO RANGE should not be used for large amounts of data (*we have more data outside the target partitions*). Instead of 2 months you can easily expand the RANGE_N to cover years/decades and then NO RANGE only stores rows before the minimum timestamp. — dnoeth, Sep 30 '22 at 08:56
@dnoeth Thanks for the advice. The intent of `*` was essentially separate out old data in one partition so we can easily delete them. I agree that this partition definition has rooms for improvement, but if the inclusion of `*` makes the partition elimination not to occur, that seems still mysterious to me and I wonder why. — Kota Mori, Sep 30 '22 at 09:05
@Andrew Yes, I have collected stats for id, time_stamp, and (id, time_stamp) but that does not change the results. The query plans for the second table conducts "all-row scan" and the runtime remains about the same. — Kota Mori, Sep 30 '22 at 21:17

Partition eliminations with multilevel partitions on teradata database

0 Answers0