0

The query runs on a huge table (1 billion records +)

Select Col1 from TableA where identcol > (select bigint(identval) from ParamTable)

(identval dataype is character and identcol is BIGINT ) This takes a long time and does a full table scan. When I replace the subquery and use a number value in the where clause for identcol, it starts to use the index on that column.

Is there a way to force DB2 to use this index, without specifying a literal value? (DB2 9.7 luw on AIX )

edit: I found BETWEEN works with the subquery and an upper value, and also uses the index. Also if I remove the explicit cast to BIGINT it starts to use the index. But still would like to know the technical explanation /pointers. Thanks!

dbza
  • 316
  • 1
  • 5
  • 19

1 Answers1

2

This behaviour could be explained by insufficient information available to the query optimizer in the absence of a literal value.

When the literal value is supplied in the predicate, the optimizer can use the value distribution statistics for tableA.identcol to estimate how many rows might match the condition. Apparently, distribution is such that, given the literal value supplied, it considers index-based access better.

When instead you supply a subquery, the optimizer has no way of knowing what identval will be returned, so it makes a guess, which in your case turns out to be suboptimal.

To achive a more consistent optimization outcome you'll need to supply the optimizer with additional information. There are a few options you might try.

  1. Create a statistical view, something like create view sv1 as select Col1 from TableA, ParamTable where TableA.identcol > ParamTable.identval (and drop the bigint() from your query -- no need to confuse the optimizer even more). You will then need to enable statistical view optimization (alter view sv1 enable query optimization) and collect distribution statistics (runstats on table sv1 with distribution). However, seeing that ParamTable has only one row in it, one might assume that it changes quite frequently, so you'll need to make sure that statistics are updated each time ParamTable.identval changes.
  2. Provide explicit selectivity estimate to the optimizer: Select Col1 from TableA where identcol > (select bigint(identval) from ParamTable) selectivity 0.001. Before you can do that, you will need to set the DB2 registry variable enabling this behaviour (db2set DB2_SELECTIVITY=ALL) and restart the instance. 0.001 indicates the proportion of the table, in this case 0.1%, that satisfies the condition. Replace 0.001 with the realistic estimate of the proportion of matching rows.

  3. Create an index on TableA (identcol, Col1) if it does not already exist (and collect statistics for it) -- this might enable index-only access.

mustaccio
  • 18,234
  • 16
  • 48
  • 57
  • Thank you so much! I will try the selectivity estimate and see if works. #3 is not possible in my specific case there are other considerations. – dbza Sep 10 '15 at 16:53
  • The explicit selectivity is not working with the subquery, it issues a SQL20046N needs a valid user-defined predicate. – dbza Sep 10 '15 at 20:18
  • 1
    Looks like you didn't execute `db2set DB2_SELECTIVITY=YES`, or didn't restart the instance after that, or both. – mustaccio Sep 10 '15 at 20:25
  • 1
    Try `db2set DB2_SELECTIVITY=ALL` instead. – mustaccio Sep 10 '15 at 20:34
  • ok thanks! I thought it was a syntax error from the message. Unfortunately I have no access for modifying the database parameters. – dbza Sep 10 '15 at 20:37