24

I got a mysql database with approx. 1 TB of data. Table fuelinjection_stroke has apprx. 1.000.000.000 rows. DBID is the primary key that is automatically incremented by one with each insert.

I am trying to delete the first 1.000.000 rows using a very simple statement:

Delete from fuelinjection_stroke where DBID < 1000000;

This query is takeing very long (>24h) on my dedicated 8core Xeon Server (32 GB Memory, SAS Storage).

Any idea whether the process can be sped up?

Sadikhasan
  • 18,365
  • 21
  • 80
  • 122
user1938509
  • 435
  • 1
  • 6
  • 15
  • 1 Bn rows is quite a lot. Do you have any indices on the table, besides the pkey? When you delete rows, the indices has to be updated, which is probably what takes time. I'm not sure if it'll work in this case, but could you try to do the delete within a transaction and see if it works better? – troelskn Apr 01 '14 at 11:12
  • 1
    To the OP can you update us on the outcome of this, I'm genuinely interested. :) – i-CONICA Apr 01 '14 at 18:28
  • 2
    Deleting is very slow. The approach to delete in portions of 10.000s works - but this is not speeding up the process as whole. I ended up doing the following: I dumped the table into a file using mysqldump. I then used sed -i '1,1000000d' file.sql to delete lines from the dump file. I truncated the table and reloaded the dump file ... that turned out to be the quickest approach ... – user1938509 Apr 27 '14 at 13:32
  • OP mentions in a comment to Uriil's answer "not having any indices or foreign keys on the table." Obviously if your WHERE condition is using a column with no index, then MySQL has to scan the ENTIRE TABLE in order to be sure it has all the rows that are `DBID < 1000000`. Whereas if `DBID` has an index on it, MySQL does not have to do that. – Buttle Butkus May 10 '15 at 20:54
  • What % rows will get deleted in this case? If it is more than 30% rows then it would be better to copy the remaining 70% in new table and then rename the new table to original and rename original to old and then drop the old table. – sactiw Jan 29 '17 at 00:12

7 Answers7

33

I believe that you table becomes locked. I've faced same problem and find out that can delete 10k records pretty fast. So you might want to write simple script/program which will delete records by chunks.

   DELETE FROM fuelinjection_stroke WHERE DBID < 1000000 LIMIT 10000;

And keep executing it until it deletes everything

Nathan Parker
  • 308
  • 2
  • 14
Uriil
  • 11,948
  • 11
  • 47
  • 68
  • The ongoing investigation pointed into two directions: 1) Since the table is really big, it just takes a long time. 2) the mysql instance is master for a master-slave replication. In spite of not having any indices or foreign keys on the table, the binlog replication between the two servers seems to slow down the process. The solution suggested here works since a single statement does not come to an end. – user1938509 Apr 02 '14 at 06:33
  • 3
    I think the reason why this solution works so well is because the size of the transaction doesn't become to big and therefor MySQL can fit it in memory. Otherwise it has to write everything to the disk, which is way slower. – thephper Jan 27 '20 at 10:40
  • I had even better result with a limit of 750. Dropping triggers and foreign key checks also improved performances a lot (as I'm doing that for a testing database). – Nicolas BADIA Jun 08 '20 at 16:29
8

Are you space deprived? Is down time impossible?

If not, you could fit in a new INT column length 1 and default it to 1 for "active" (or whatever your terminology is) and 0 for "inactive". Actually, you could use 0 through 9 as 10 different states if necessary.

Adding this new column will take a looooooooong time, but once it's over, your UPDATEs should be lightning fast as long as you do it off the PRIMARY (as you do with your DELETE) and you don't index this new column.

The reason why InnoDB takes so long to DELETE on such a massive table as yours is because of the cluster index. It physically orders your table based upon your PRIMARY (or first UNIQUE it finds...or whatever it feels like if it can't find PRIMARY or UNIQUE), so when you pull out one row, it now reorders your ENTIRE table physically on the disk for speed and defragmentation. So it's not the DELETE that's taking so long. It's the physical reordering after that row is removed.

When you create a new INT column with a default value, the space will be filled, so when you UPDATE it, there's no need for physical reordering across your huge table.

I'm not sure exactly what your schema is exactly, but using a column for a row's state is much faster than DELETEing; however, it will take more space.

Try setting values:

innodb_flush_log_at_trx_commit=2
innodb_flush_method=O_DIRECT (for non-windows machine)
innodb_buffer_pool_size=25GB (currently it is close to 21GB)
innodb_doublewrite=0
innodb_support_xa=0
innodb_thread_concurrency=0...1000 (try different values, beginning with 200)

References:

MySQL docs for description of different variables.

MySQL Server Setting Tuning

MySQL Performance Optimization basics

http://bugs.mysql.com/bug.php?id=28382

jmail
  • 5,944
  • 3
  • 21
  • 35
  • I am trying to free up data because my sas san is near 100% - it is only 4TB. – user1938509 Apr 01 '14 at 13:26
  • 1
    Based on this, what I don't get is why the delete is faster the fewer rows you delete. Surely it's not re-organising the table _N_ times when you delete _N_ rows (in a single `DELETE`)? Because I've just added a `MarkedForDeletion` column as you suggest, but gone further and added an async operation that will (gradually) physically delete these in lil' batches (as the accepted answer suggests, and as I've heard elsewhere)... and it works great! – Lightness Races in Orbit Feb 21 '18 at 01:34
6

What indexes do you have?

I think your issue is that the delete is rebuilding the index on every iteration.

I'd delete the indexes if any, do the delete, then re-add the indexes. It'll be far faster, (I think).

i-CONICA
  • 2,361
  • 9
  • 30
  • 45
5

I was having the same problem, and my table has several indices that I didn't want to have to drop and recreate. So I did the following:

create table keepers
select * from origTable where {clause to retrieve rows to preserve};
truncate table origTable;
insert into origTable null,keepers.col2,...keepers.col(last) from keepers;
drop table keepers;

About 2.2 million rows were processed in about 3 minutes.

user5883982
  • 53
  • 1
  • 3
  • This is preferred solution in most situations, just need select after origTable so it works for copy-paste SQL beginners too. Should be: "create table keepers select * from origTable where {clause to retrieve rows to preserve}; truncate table origTable; insert into origTable select * from keepers; drop table keepers;" – Piotr Farbiszewski Feb 12 '23 at 10:58
0

Your database may be checking for records that need to be modified in a foreign key (cascades, delete).

But I-Conica answer is a good point(+1). The process of deleting a single record and updating a lot of indexes during done 100000 times is inefficient. Just drop the index, delete all records and create it again.

And of course, check if there is any kind of lock in the database. One user or application can lock a record or table and your query will be waiting until the user release the resource or it reachs a timeout. One way to check if your database is doing real work or just waiting is lauch the query from a connection that sets the --innodb_lock_wait_timeout parameter to a few seconds. If it fails at least you know that the query is OK and that you need to find and realease that lock. Examples of locks are Select * from XXX For update and uncommited transactions.

borjab
  • 11,149
  • 6
  • 71
  • 98
  • I am not quite sure if I understand this answer. My table doesn't have any indices. Just the one primary key DBID (autoincremented during insert). My understanding is, that innodb does not work without at least one unique index, if not assigned explicitly it will create on implicitly: "If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. " – user1938509 Apr 01 '14 at 13:29
  • If you have only one primary key index I assume that the problem is not the index. Please check that there is no foreign key relationship. (For example if you have a fuelinjection_stroke_history everytime you delete one record you will have to check the orphans in this secondary table. That would be 1000.000 queries) Has the query finished? If not I would assume that it is a lock. – borjab Apr 01 '14 at 15:00
  • 1
    The database ist replicated (master/slave) - this is slowing the process (binlogs). The table does not have any foreign keys etc. ... – user1938509 Apr 02 '14 at 06:35
-1

For such long tables, I'd rather use MYISAM, specially if there is not a lot of transactions needed.

-8

I don't know exact ans for ur que. But writing another way to delete those rows, pls try this.

delete from fuelinjection_stroke where DBID in
(
    select top 1000000 DBID  from fuelinjection_stroke 
    order by DBID asc
)
AK47
  • 3,707
  • 3
  • 17
  • 36