For a MySQL table called mydb.mytable
just run the following:
OPTIMIZE TABLE mydb.mytable;
You could also do this in stages:
CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ALTER TABLE mydb.mytable_old;
ANALYZE TABLE mydb.mytable;
In either case, the table ends up with no fragmentation.
Give it a Try !!!
UPDATE 2012-12-03 12:50 EDT
If you are concerned whether or not rows are reused upon bulk INSERTs via LOAD DATA INFILE
, please note the following:
When you created the MyISAM table, I assumed the default row format would be dynamic. You can check what it is with either
SHOW CREATE TABLE mydb.mytable\G
or
SELECT row_format FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable';
Since the row format of your table is Dynamic
, the fragmented rows are of various sizes. The MyISAM storage engine would have keep checking for the row length of each deleted to see if the next set of data being insert will fit. If the incoming data cannot fit in any of the deleted rows, then the new row data is appended.
The presence of such rows can make myisamchk
struggle.
This is why I recommended running OPTIMIZE TABLE
. That way, data would be appended quicker.
UPDATE 2012-12-03 12:58 EDT
Here is something interesting you can also do: Try setting concurrent_insert to 2. That way, you are always appending to a MyISAM table without checking for gaps in the table. This will speed up INSERTs dramatically but leave all known gaps alone.
You could still defragment your table at your earliest convenience using OPTIMIZE TABLE
.
UPDATE 2012-12-03 13:40 EDT
Why don't run the my second sugesstion
CREATE TABLE mydb.mytable_new LIKE mydb.mytable;
ALTER TABLE mydb.mytable_new DISABLE KEYS;
INSERT INTO mydb.mytable_new SELECT * FROM mydb.mytable;
ALTER TABLE mydb.mytable_new ENABLE KEYS;
ALTER TABLE mydb.mytable RENAME mydb.mytable_old;
ALTER TABLE mydb.mytable_new RENAME mydb.mytable;
ANALYZE TABLE mydb.mytable;
This will give you an idea
- How long
OPTIMIZE TABLE
would take to run
- How much smaller the
.MYD
and .MYI
would be after running OPTIMIZE TABLE
After you run my second suggestion, you can compare them with
SELECT
A.mydsize,B.mydsize,A.mydsize - B.mydsize myd_diff,
A.midsize,B.myisize,A.myisize - B.myisize myi_diff
FROM
(
SELECT data_length mydsize,index_length myisize
FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable'
) A,
(
SELECT data_length mydsize,index_length myisize
FROM information_schema.tables
WHERE table_schema='mydb' AND table_name='mytable_new'
) B;
UPDATE 2012-12-03 16:42 EDT
Any table whose ROW_FORMAT is set to fixed has the luxury of allocating the same length row every time. If MyISAM tables maintain a list of deleted rows, the very first row in the list should always be selected as the next row to insert data. There would be no need to traverse a whole list until a suitable row gaps with sufficient length is found. Each deleted row is quickly appended after a DELETE
. Each INSERT would pick the first row of the deleted rows.
We can assume these things because MyISAM tables can do concurrent inserts. In order for this feature to be available via the concurrent_insert option, INSERTs into a MyISAM table must be able to detect one of three(3) things:
- The presence of a list of deleted rows, thus choosing from the list
- Row_Format=Dynamic : list of deleted rows with each row with a different length
- Row_Format=Fixed : list of deleted rows with all rows the same length
- The absence of a list of deleted rows, thus appending
- Bypass checking for the presence of a list of deleted rows (set concurrent_insert to 2)
For detection #1 to be the fastest possible, a MyISAM table's row_format must be Fixed. If it is Dynamic, it is very possible that a list traversal is necessary.