-3

I have a table with 33 million unique record. New records inserted in it twice in a week. when I import data its take about 8 to 9hrs. The new file which used to load has 6 to 7 million record s. How can i optimize this? Table engine is MYISAM.

1 Answers1

2

Indexing = overhead

Any index including unique indexes are a bit of extra work to be done when inserting the row (aka overhead). It's not necessarily a lot of work when take individually -- especially since compensate by helping speed things up when reading information but -- it does add up.

When inserting a new row into a table with indexes, every column which is included in an index must be registered into the index file; every unique constraint must be checked before insertion to make sure no duplicates are saved.

Imagine having to check 33 million rows of unique values once for every inserted row. If you're inserting 1 million extra rows that's 33 billion checks.

Drop uniqueness in favor of PK:

If you drop the uniqueness constraint you will be able to insert rows much faster. Unfortunately if you can't guarantee that the newly imported values are unique without checking then you kinda need the uniqueness constraint.

Also dropping the uniqueness constraint means that when you query that table searching for a particular item filtering based on the column that used to be unique the query will be slower (much slower in 33 million rows).

If you can guarantee that they are unique or can leave the checking for a later stage (clean of duplicates) then you can just setup a primary key with auto_increment or UUID and do the duplicate cleanup afterwards.

If you don't need to query that table using the unique column (if you can change the application to query by primary key) things will move much faster.

Mihai Stancu
  • 15,848
  • 2
  • 33
  • 51