0

I need to delete a bunch of records (literally millions) but I don't want to make it in an individual statement, because of performance issues. So I created a view:

CREATE VIEW V1 
AS 
    SELECT FIRST 500000 * 
    FROM TABLE 
    WHERE W_ID = 14

After that I do a bunch deletes for example:

DELETE FROM V1 WHERE TS < 2021-01-01 

What I want is to import this logic in a While loop and in stored procedure. I tried SELECT COUNT query like this:

SELECT COUNT(*) 
FROM TABLE 
WHERE W_ID = 14 AND TS < 2021-01-01;

Can I use this number in the same procedure as a condition and how can I manage that?

This is what I have tried and I get an error

ERROR: Dynamic SQL Error; SQL error code = -104; Token unknown; WHILE

Code:

CREATE PROCEDURE DeleteBatch
  AS
  DECLARE VARIABLE CNT INT;
  BEGIN
       SELECT COUNT(*) FROM TABLE WHERE W_ID = 14 AND TS < 2021-01-01 INTO :cnt;
        WHILE cnt > 0 do
         BEGIN
          IF (cnt > 0) THEN
          DELETE FROM V1 WHERE TS < 2021-01-01;
         END
     ELSE break;
  END

I just can't wrap my head around this.

To clarify, in my previous question I wanted to know how to manage the garbage_collection after many deleted records, and I did what was suggested - SELECT * FROM TABLE; or gfix -sweep and that worked very well. As mentioned in the comments the correct statement is SELECT COUNT(*) FROM TABLE;

After that another even bigger database was given to me - above 50 million. And the problem was the DB was very slow to operate with. And I managed to get the server it was on, killed with a DELETE statement to clean the database.

That's why I wanted to try deleting in batches. The slow-down problem there was purely hardware - HDD has gone, and we replaced it. After that there was no problem with executing statements and doing backup and restore to reclaim disk space.

Tony
  • 618
  • 12
  • 27
  • It is unclear what you're really asking to me. I see at least two different questions being asked, 1) about how to address performance issues (without clearly addressing what kind of performance problems, but judging by your previous question, it is the one caused by garbage collection), and 2) token unknown error with your solution (which is caused by syntax errors). Please make sure to focus one one and only one question, and clearly define the problem you want addressed. – Mark Rotteveel Feb 24 '21 at 10:56
  • 1
    `I did what was suggested - SELECT * FROM TABLE` - this was either wrong advice or your wrong repeating of it. It should be `select COUNT(*)` not `select *`. – Arioch 'The Feb 24 '21 at 11:43
  • @Arioch'The It is not "wrong" advice because it works, but you're right, it is not very efficient compared to doing `select count(*)`. In [my answer](https://stackoverflow.com/a/66266888/466862) to the OP's previous question, I said _"execute a statement that performs a full table scan (e.g. `select * from table`)"_, but I'll update that to `select count(*)`. – Mark Rotteveel Feb 24 '21 at 17:08
  • Did you literally mean kill (as in terminate/crash the Firebird server) in _"And I managed to get the server it was on, killed with a DELETE statement"_, if so, that would be a severe bug, or did you just mean "it slowed down a lot"? – Mark Rotteveel Feb 24 '21 at 17:14
  • It was slowed down severely (well it had to be restarted to function again), as I said it was because of a faulty drive. – Tony Feb 24 '21 at 18:02

3 Answers3

2

Provided the data that you need to delete, doesn't ever need to be rollbacked once the stored procedure is kicked off, there is another way to handle massive DELETEs in a Stored Procedure.

The example stored procedure will delete the rows 500,000 at a time. It will loop until there aren't any more rows to delete. The AUTONOMOUS TRANSACTION will allow you to put each delete statement in its own transaction and it will commit immediately after the statement completes. This is issuing an implicit commit inside a stored procedure, which you normally can't do.

CREATE OR ALTER PROCEDURE DELETE_TABLEXYZ_ROWS
AS
DECLARE VARIABLE RC INTEGER;
BEGIN

  RC = 9999;

  WHILE (RC > 0) DO
  BEGIN

    IN AUTONOMOUS TRANSACTION DO
    BEGIN
      DELETE FROM TABLEXYZ ROWS 500000;

      RC = ROW_COUNT;
    END
  END
  SELECT COUNT(*)
  FROM TABLEXYZ
  INTO :RC;
END
Ed Mendez
  • 1,510
  • 10
  • 14
1

because of performance issues

What are those exactly? I do not think you actually are improving performance, by just running delete in loops but within the same transaction, or even different TXs but within the same timespan. You seem to be solving some wrong problem. The issue is not how you create "garbage", but how and when Firebird "collects" it.

For example, Select Count(*) in Interbase/Firebird engines means natural scan over all the table and the garbage collection is often trigggered by it, which can itself get long if lot of garbage was created (and massive delete surely does, no matter if done by one million-rows statement or million of one-row statements).

How to delete large data from Firebird SQL database

If you really want to slow down deletion - you have to spread that activity round the clock, and make your client application call a deleting SP for example once every 15 minutes. You would have to add some column to the table, flagging it is marked for deletion and then do the job like that

CREATE PROCEDURE DeleteBatch(CNT INT)
AS
DECLARE ROW_ID INTEGER;
BEGIN
  FOR SELECT ID FROM TABLENAME WHERE MARKED_TO_DEL > 0 INTO :row_id
  DO BEGIN
     CNT = CNT - 1;
     DELETE FROM TABLENAME WHERE ID = :ROW_ID;
     IF (CNT <= 0) THEN LEAVE;  
  END
  SELECT COUNT(1) FROM TABLENAME INTO :ROW_id; /* force GC now */
END

...and every 15 minutes you do EXECUTE PROCEDURE DeleteBatch(1000).

Overall this probably would only be slower, because of single-row "precision targeting" - but at least it would spread the delays.

Arioch 'The
  • 15,799
  • 35
  • 62
  • Thank you for answering. So by performance issues I mean - slow delete operation and disk drives becoming very slow due to many read/write operations. With batch delete it's much faster. Because it's wanted to decrease the size of the database I will run either `gfix -sweep` or `gbak -g` after the batch deletion to activate the garbage collection. – Tony Feb 24 '21 at 08:26
  • 1
    try increasing RAM cache? see articles and perhaps even relaxed firebird conf files at ib-aid.com @T0ny1234 – Arioch 'The Feb 24 '21 at 09:51
  • 1
    `because it's wanted to decrease the size of the database` What do you mean? If you mean logical content, number of rows, then just do it overnight, when user's are mostly disconnected/idle and slowdowns would not hurt them. Not mere GC, but the very deleting too. If you mean reducing database file disk space - there is no way to do it, except from creating the database file anew (backup/restore/check/replace, after removing those rows). – Arioch 'The Feb 24 '21 at 09:54
  • 1
    @T0ny1234 using `gbak -g` will **disable** garbage collection during a backup. – Mark Rotteveel Feb 24 '21 at 10:52
1

Use DELETE...ROWS.
https://firebirdsql.org/file/documentation/html/en/refdocs/fblangref25/firebird-25-language-reference.html#fblangref25-dml-delete-orderby

But as I already said in the answer to the previous question it is better to spend time investigating source of slowdown instead of workaround it by deleting data.

user13964273
  • 1,012
  • 1
  • 4
  • 7