-1

Database

Db2

Scenario

I have the following code, where on a NODE table I have about 135k rows and a NODE_LIST common table that has about 4k rows, and deleting all the rows from NODE that can be found in NODE_LIST takes about 48 seconds.

I would like to run the query such that it completes in under 5 seconds.

Additionally, I have noticed that its efficiency may be degrading as I add more nodes to the NODE table.

Current Solution

SELECT COUNT (*) FROM OLD TABLE(
   DELETE FROM NODE n WHERE EXISTS (SELECT * FROM NODE_LIST nl WHERE nl.ID = n.ID)
);
H. Trujillo
  • 427
  • 1
  • 8
  • 21
  • Do you have an index on `node_list(id)`? – Gordon Linoff Mar 06 '18 at 20:36
  • It is a Primary Key – H. Trujillo Mar 06 '18 at 20:40
  • I'd start with posting the statement explain plan. – mustaccio Mar 06 '18 at 21:40
  • Same if its a primary key, there are not index automaticly. Add an index and try to delete – Esperento57 Mar 07 '18 at 08:16
  • But wouldn't that just postpone the inevitable which is that at the query will slow down as I add more nodes? – H. Trujillo Mar 07 '18 at 13:34
  • Have you done any monitoring to determine exactly how much time Db2 is spending on which activity during the delete? For example: finding which rows to delete, reading those rows, waiting for logs, writing to logs, waiting for writes to complete etc? You have set a target of under 5-seconds, but have you ensured that your logging-configuration and I/O bandwidth can *deliver* that (i.e. what has been provisioned)? If you need to regularly do bulk deletes, have you considered range-partitioning to deliver rapid roll out of old data? – mao Mar 07 '18 at 14:08
  • Reading the rows to be deleted and creating the NODE_LIST takes about 0.025s while deleting takes about 48s. We have tried this in environments with 4GB and 16GB of memory and although there were some improvements in time, when we increased the size of NODE to 650k rows, we saw a similar execution time. – H. Trujillo Mar 07 '18 at 15:47
  • Most likely the time is spent logging and waiting for I/O . Is the NODE table organize-by-column ? Why not provide the explain plan (db2exfmt) as requested earlier ? – mao Mar 07 '18 at 16:19
  • I'm not sure I can release the statement explain plan; but I also don't think it would help solve the issue because the problem isn't on the common table, it's on deleting whats in it. I think the problem lies in that the way `delete` works is that it checks if a row in NODE is in NODE_LIST by checking every row in NODE. – H. Trujillo Mar 07 '18 at 18:17
  • @H.Trujillo If you don't want the DB to do a full table scan, then you need to add an appropriate index. – jmarkmurphy Mar 08 '18 at 13:59
  • A current possible solution seems to be to create a temp table that excludes what I don't want, and replace the entire original table with the temp table. But this solution seems very drastic. – H. Trujillo Mar 08 '18 at 17:27
  • @jmarkmurphy we are considering that, we're just not sure the improvement will be drastic enough. However, if we do, and it is a big change, I'll post it here. – H. Trujillo Mar 08 '18 at 17:28

2 Answers2

0

Are the node_id's that you are deleting randomly spread thru the NODE table? Can you use an cluster index to reduce the number of pages that will have node_id's deleted each time. https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.admin.dbobj.doc/doc/c0061098.html or use an MDC table https://www.ibm.com/support/knowledgecenter/en/SSEPGG_11.1.0/com.ibm.db2.luw.admin.partition.doc/doc/c0007201.html

Paul Vernon
  • 3,818
  • 1
  • 10
  • 23
0

It seems that the best answer is the original solution I posted, with the addition of INDEXes (as some of the users suggested):

SELECT COUNT (*) FROM OLD TABLE(
   DELETE FROM NODE n WHERE EXISTS (SELECT * FROM NODE_LIST nl WHERE nl.ID = n.ID)
);

Additionally, there are better approaches (like creating a permanent indexed table, which we may use), but we decided to use the temp table approach for the time-being.

Using IBM Data Server Manager we created an explain plan that helped us apply the correct INDEXes and dramatically improve our performance.

H. Trujillo
  • 427
  • 1
  • 8
  • 21