I have a table with 33 million unique record. New records inserted in it twice in a week. when I import data its take about 8 to 9hrs. The new file which used to load has 6 to 7 million record s. How can i optimize this? Table engine is MYISAM.
-
Please tell us table definition including any indexes. – Stoleg Mar 28 '14 at 12:53
-
Table has two column,Mob_no varchar(12) primary Key, Trndate varchar (20). – ABHISHEK_SiNGH Mar 28 '14 at 13:03
-
use index in database – Haseeb Mar 28 '14 at 13:06
-
@haseeb au contraire indexes slow down database "insertions". They only speed up database "reads" and "updates" (because updates need reeds obviously). – Mihai Stancu Mar 28 '14 at 13:16
-
use LOAD DATA INFILE. http://stackoverflow.com/a/33307547 ... Last 2 I helped with: 8 hours reduced to 76 minutes, and 3 days reduced to 30min – Drew Oct 23 '15 at 19:22
1 Answers
Indexing = overhead
Any index including unique indexes are a bit of extra work to be done when inserting the row (aka overhead). It's not necessarily a lot of work when take individually -- especially since compensate by helping speed things up when reading information but -- it does add up.
When inserting a new row into a table with indexes, every column which is included in an index must be registered into the index file; every unique constraint must be checked before insertion to make sure no duplicates are saved.
Imagine having to check 33 million rows of unique values once for every inserted row. If you're inserting 1 million extra rows that's 33 billion checks.
Drop uniqueness in favor of PK:
If you drop the uniqueness constraint you will be able to insert rows much faster. Unfortunately if you can't guarantee that the newly imported values are unique without checking then you kinda need the uniqueness constraint.
Also dropping the uniqueness constraint means that when you query that table searching for a particular item filtering based on the column that used to be unique the query will be slower (much slower in 33 million rows).
If you can guarantee that they are unique or can leave the checking for a later stage (clean of duplicates) then you can just setup a primary key with auto_increment or UUID and do the duplicate cleanup afterwards.
If you don't need to query that table using the unique column (if you can change the application to query by primary key) things will move much faster.

- 15,848
- 2
- 33
- 51