0

I'm building a table that will be used for READ ONLY which is why I decided to go with MyISAM instead of InnoDB (but i'm open to suggestions). However, this table needs to be extremely large. Although each row is about 12-14 bytes, the table itself will contain hundreds of millions to billions of rows.

The table looks like this:

CREATE TABLE `tb1` ( `wid` int(11) NOT NULL DEFAULT '0',
`pid` int(11) NOT NULL DEFAULT '0',
`hit` tinyint(3) unsigned NOT NULL DEFAULT '0',
KEY `wid` (`wid`,`hit`,`pid`))
ENGINE=MyISAM DEFAULT CHARSET=latin1

I am loading all the data from text files using LOAD DATA LOCAL INFILE in chunks.

It takes me 30 seconds to load about 20M rows. Then I load a chunk with 100M rows and 10 minutes later, after canceling the query, there's only 38M rows total. (18M rows in 10 minutes?..)

It appears this type of loading is exponential rather than linear. Why is that?

What can I do to speed up LOAD DATA INFILE when I have an index that keeps increasing?

Are there benefits to using InnoDB instead of MyISAM in terms of LOAD DATA INFILE speed?

Will it be quicker to load all the data with no indexes.. and then apply one afterwards?

Thanks for your input.

nick
  • 2,743
  • 4
  • 31
  • 39

2 Answers2

1

Just run:

ALTER TABLE `name` DISABLE KEYS;

... do your query ...

ALTER TABLE `name` ENABLE KEYS;

And yes every insert also requires an insert into each index associated to the table. So it is slower with indexes.

Also note you can use INSERT DELAYED on MyISAM.

Petah
  • 45,477
  • 28
  • 157
  • 213
1

The slow down is the indexing. You could use the disable keys trick, which will import the data quicker, but the enable keys step will still take a long time. What you really want to do is max out your myisam_sort_buffer_size when importing data. The more memory you can allocate to this, the less MySQL needs to go to disk for sorting and updating the index. You can still use the disable keys trick and you can lower the value after your import is done.

The speed up with changing the myisam_sort_buffer_size value can be dramatic. For me, I've seen imports that took hours, drop to about 20 minutes.

Brent Baisley
  • 12,641
  • 2
  • 26
  • 39