I have one table that looks like this:
+-------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------+------+-----+---------+-------+
| name | varchar(255) | NO | PRI | NULL | |
| timestamp1 | int | NO | | NULL | |
| timestamp2 | int | NO | | NULL | |
+-------------+--------------+------+-----+---------+-------+
This table has around 250 million rows in it. I get a csv once a day that contains around 225 million rows of just one name column. 99% of the names that are in the csv I get everyday are already in the database. So what I want to do is for all the ones that are already there I update their timestamp1
column to UNIX_TIMESTAMP(NOW())
. Then all the names that are not in the original table, but are in the csv I add to the original table. Right now this is how I am doing this:
DROP TEMPORARY TABLE IF EXISTS tmp_import;
CREATE TEMPORARY TABLE tmp_import (name VARCHAR(255), primary_key(name));
LOAD DATA LOCAL INFILE 'path.csv' INTO TABLE tmp_import LINES TERMINATED BY '\n';
UPDATE og_table tb SET timestamp1 = UNIX_TIMESTAMP(NOW()) WHERE og.name IN (SELECT tmp.name FROM tmp_import tmp);
DELETE FROM tmp_import WHERE name in (SELECT og.name FROM og_table og);
INSERT INTO og_table SELECT name, UNIX_TIMESTAMP(NOW()) AS timestamp1, UNIX_TIMESTAMP(NOW()) AS timestamp2 FROM tmp_import;
As someone might guess the update line is taking a long time, over 6 hours or throwing an error. Reading the data in is taking upwards of 40 minutes. I know this is mostly because it is creating an index for name
when I don't set it as a primary key it only takes 9 minutes to read the data in, but I thought having an index would speed up the operation. I have tried the update several different way. What I have and the following:
UPDATE og_table og SET timestamp1 = UNIX_TIMESTAMP(NOW()) WHERE EXISTS (SELECT tmp.name FROM tmp_import tmp where tmp.name = og.name);
UPDATE og_table og inner join tmp_import tmp on og.name=tmp.name SET og.timestamp1 = UNIX_TIMESTAMP(NOW());
Both of those attempts did not work. It normally takes several hours and then ends up with:
ERROR 1206 (HY000): The total number of locks exceeds the lock table size
I am using InnoDB
for these tables, but there are no necessary foreign keys and having the benefit of that engine is not necessarily needed so I would be open to trying different storage engines.
I have been looking through a lot of posts and have yet to find something to help in my situation. If I missed a post I apologize.