0

I am trying to optimize the export process of a query. I have the following tables (I omit some irrelevant fields):

CREATE TABLE _termsofuse (
    ID int(11) NOT NULL AUTO_INCREMENT, TTC_ART_ID int(11) DEFAULT NULL,
    TTC_TYP_ID int(11) DEFAULT NULL,
    TERM_OF_USE_NAME varchar(200) DEFAULT NULL,
    TERM_OF_USE_VALUE varchar(200) DEFAULT NULL,
    PRIMARY KEY (ID)
) ENGINE=InnoDB AUTO_INCREMENT=185905671 DEFAULT CHARSET=utf8;
CREATE TABLE vehicle ( ID mediumint(8) unsigned NOT NULL AUTO_INCREMENT, TTC_TYP_ID int(11) unsigned NOT NULL, PRIMARY KEY (ID), UNIQUE KEY TTC_TYP_ID_UNIQUE (TTC_TYP_ID) ) ENGINE=InnoDB AUTO_INCREMENT=44793 DEFAULT CHARSET=utf8;
CREATE TABLE part ( ID int(11) unsigned NOT NULL AUTO_INCREMENT, TTC_ART_ID int(11) unsigned NOT NULL, PRIMARY KEY (ID), UNIQUE KEY TTC_ART_ID_UNIQUE (TTC_ART_ID) ) ENGINE=InnoDB AUTO_INCREMENT=3732260 DEFAULT CHARSET=utf8;
CREATE TABLE term_of_use_name ( ID smallint(5) unsigned NOT NULL AUTO_INCREMENT, ID_Lang tinyint(3) unsigned NOT NULL, Name varchar(200) NOT NULL, PRIMARY KEY (ID, ID_Lang), UNIQUE KEY Name_Lang_UNIQUE (Name, ID_Lang), KEY fk_term_of_use_name_lang_id_lang_idx (ID_Lang), CONSTRAINT fk_term_of_use_name_lang_id_lang FOREIGN KEY (ID_Lang) REFERENCES lang (ID) ON DELETE NO ACTION ON UPDATE NO ACTION ) ENGINE=InnoDB AUTO_INCREMENT=732 DEFAULT CHARSET=utf8;
CREATE TABLE term_of_use_value ( ID mediumint(8) unsigned NOT NULL AUTO_INCREMENT, ID_Lang tinyint(3) unsigned NOT NULL, Value varchar(200) NOT NULL, PRIMARY KEY (ID,ID_Lang), UNIQUE KEY Value_Lang_UNIQUE (Value,ID_Lang), KEY fk_term_of_use_value_lang_id_lang_idx (ID_Lang), CONSTRAINT fk_term_of_use_value_lang_id_lang FOREIGN KEY (ID_Lang) REFERENCES lang (ID) ON DELETE NO ACTION ON UPDATE NO ACTION ) ENGINE=InnoDB AUTO_INCREMENT=887502 DEFAULT CHARSET=utf8;
Now I try to select some columns to a csv file. Afterwards i will import the file to a database table, but I suspect this should not take too much time. My Select statement is the following:

SELECT DISTINCT vehicle.ID, part.ID, term_of_use_name.ID, term_of_use_value.ID FROM _termsofuse
INNER JOIN vehicle ON vehicle.TTC_TYP_ID = _termsofuse.TTC_TYP_ID
INNER JOIN part ON part.TTC_ART_ID = _termsofuse.TTC_ART_ID
INNER JOIN term_of_use_name ON term_of_use_name.Name = _termsofuse.TERM_OF_USE_NAME AND term_of_use_name.ID_Lang = 2
INNER JOIN term_of_use_value ON term_of_use_value.Value = _termsofuse.TERM_OF_USE_VALUE AND term_of_use_value.ID_Lang = 2
INTO OUTFILE 'termsofuse.csv'
CHARACTER SET utf8
FIELDS TERMINATED BY ';' OPTIONALLY ENCLOSED BY '"' 
LINES TERMINATED BY '\r\n';
This query takes longer than 8 hours on my laptop (I have 4 GB of RAM). I tried to see the explain of the SELECT part and it shows the following: Explain statement

I do not understand where exactly is the bottleneck. I have exported a similar (about 95 Million records) query in less than 1h. Also breaking the results into multiple tables using limit does not seem to help much... Please have a look and any additional info you require just tell me. Thank you in advance.
EDIT 15/01/2016
Results of Explain Select
enter image description here

pankal
  • 124
  • 13
  • Right off the bat, the biggest gain would be to change the join on varchars to ints, figure out a better way to get at the data in the terms_of_use* tables. Or at the very least add indices for the join columns in the terms_of_use* tables if you can't join on ints. – Mike Purcell Jan 14 '16 at 17:28
  • I suspected that the index on (varchar, int) is the main problem. The thing is I can't think of a way to avoid this - the ID is autogenerated. The join columns are (Name, ID_Lang) and the index is already on them, I can't improve this, can I? – pankal Jan 15 '16 at 17:53
  • Ok, it seems that `distinct` without index is quite a problem. I am not sure how to solve this. Now on varchar fields: is it better to create a partial (prefixed) index on the varchar columns or create a new column to store an md5 value to use for indexing? – pankal Jan 16 '16 at 12:44

1 Answers1

0

Why have an ID when you have a perfectly good UNIQUE INT that could be the PK?

Seriously -- Having to reach through a secondary key slows things down. If each lookup slows it down by a factor of 2, that could add up.

How much RAM do you have? What is the value of innodb_buffer_pool_size? It should be about 70% of available RAM.

Let's see the EXPLAIN SELECT ...; there may be more clues there.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • innodb_buffer_pool_size had been set to 1GB. – pankal Jan 15 '16 at 17:20
  • I will update my question with the explain select. You mean it would have been better to have the Name, ID_Lang as PK? The thing is later I would need to join on ID, ID_Lang. Take into account that the ID is *not* unique. Two entries on different languages should have the same (otherwise auto-generated) ID. – pankal Jan 15 '16 at 17:27