0

I know this question has probably been asked many times, but bear with me; I couldn't find the answer on my specific situation on stackoverflow. I'm also not a wiz at SQL and specifically not DB2 SQL. We're using DB2 12 running on z/OS.

I need to speed up a select from a table of persons, with more than 5 million rows. It's an old DB2 table, it's not intelligently structured, and it has no primary keys or anything. I can, however, create keys or indexes, if it helps. So my question is: Will it help? and if yes, what would be the best way to create the key / index? This is the situation:

Apart from information such as name and address, the table contains a PERSONTYPE (char 1), a PERSONCODE (char 8) and a DNUM (char 16), where DNUM is a kind of a case-number that the person is tied to. (If a row has a DNUM, it does not have a PERSONCODE.) I need to find all rows that have a specific subset of DNUM. Several rows can have the same DNUM, as several persons can be tied to a case. So this is the SQL:

SELECT NAME, ADRESS, SHOESIZE FROM PERSONTABLE WHERE DNUM IN ([a set of 1000 DNUMS])

This query takes about 55 seconds to produce a result. As a comparison, this query:

SELECT NAME, ADRESS, SHOESIZE FROM PERSONTABLE P JOIN OTHERTABLE O ON (P.PERSONCODE = O.PERSONCODE) WHERE O.DNUM IN ([same set of 1000 DNUMS])

takes about 2 seconds. (Yes, this OTHERTABLE also has the DNUM column.)

This is part of a data migration project, and 95% of the entire execution time is eaten up by the first SQL, so I'm keen on trying to speed that up.

I am able to index and insert keys for PERSONTABLE, but as DNUM isn't a unique column, I can't make it a primary key. There is, however, an ORDER_NUMBER column in the PERSONTABLE that numbers each person involved in a case, so DNUM+ORDER_NUMBER could possibly be made a primary key. But would this speed up the select? It's an old table and don't want to mess with it without knowing what it'll do.

To repeat, the system is DB2 12 running on z/OS.

Does anyone of you have a suggestion?

data_henrik
  • 16,724
  • 2
  • 28
  • 49
Zimeon
  • 9
  • 3
  • What version of Db2 on which platform (z/OS, LUW, IBM i)? There are tools and recommendations to estimate possible improvements, but they depend on more information from you. And what is the overall workload? Are inserts / updates / deletes happening? – data_henrik Mar 29 '23 at 07:51
  • Sorry for the lack of information. I've updated the question with that info, but here it is too: It's DB2 12 running on z/OS. The overall workload starts with a number of selects, and after that there are inserts in other tables. – Zimeon Mar 29 '23 at 08:28
  • Speak with your Z/OS DBA folks, ask for a review of the access-plan, and whether the operating mode of the subsystem makes a difference. Db2 v12 was supposed to specifically address previous problems with long IN list predicates, and supposedly has better plans also when the IN list has 3000 entries. Regardless of that, making a temp table and indexing it, for storing the IN list and doing a join has other benefits. – mao Mar 29 '23 at 10:30

1 Answers1

0

The answer was way simpler than I thought. I let it remain here in case someone else is as inexperienced as I and comes here to look.

The answer is YES, you can DEFINITELY speed up the selects, just create an index on the column you are most commonly using for the "where" clause.

CREATE INDEX WHATEVER ON PERSONTABLE(DNUM);
CREATE INDEX WHATEVER_AGAIN ON PERSONTABLE(PERSONCODE)

As my project is a one-run-only, and I only needed persons of a specific type, I also first copied the relevant data from PERSONTABLE to PERSONTABLE_TEMP, which reduced the number of rows from 5 million to half a million. Then I created the indexes on the reduced table.

The first query went from 50 seconds to 0,5, roughly 100 times faster.

Zimeon
  • 9
  • 3