1

I have two tables tableA and tableB. tableA has 2 Million records and tableB has over 10 millions records. tableA has more than thirty columns whereas tableB has only two column. I need to update a column in tableA from tableB by joining both tables.

UPDATE tableA a 
INNER JOIN tableB b  ON a.colA=b.colA
 SET a.colB= b.colB 

colA in both table has been indexed.

Now when I execute the query it takes hours. Honestly I never saw it completed and max i have waited is 5 hours. Is their any way to complete this query within 20-30 minutes. What approach should I take.

EXPLAIN on SQL Query

"id" "_type" "table" "type" "possible_" "key" "key_len"   "ref"   "rows" "Extra"
"1" "SIMPLE" "a"     "ALL"  "INDX_DESC" \N    \N          \N   "2392270"  "Using where"
"1" "SIMPLE" "b"     "ref"  "indx_desc" "indx_desc" "133" "cis.a.desc" "1"  "Using where"
Rick James
  • 135,179
  • 13
  • 127
  • 222
Syash A
  • 19
  • 1
  • 4
  • you have some id in tables ?? – ScaisEdge Aug 19 '18 at 11:53
  • I have id in tableA which is a PK but don't have any id in tableB. The joined columns are varchar type and have text inside it. – Syash A Aug 19 '18 at 11:59
  • indexed? i mean is column colA indexed in both tables? Because the query runs for 5 hour+ i assume the columns aren't indexed. – Raymond Nijland Aug 19 '18 at 12:00
  • fisrt check for expalin plan and be sure your indexes are involved then you could try updating a part first 10.000 or 50.00 or 100.000 from tableA .. and see how work .. – ScaisEdge Aug 19 '18 at 12:01
  • joining on varchar !!! dont do that. if u can join using some integer, that would be better and put WHERE condition matching the varchar from the two tables. Also index the varchar field in both the tables – Madhur Bhaiya Aug 19 '18 at 12:02
  • @RaymondNijland yes, both are indexed. – Syash A Aug 19 '18 at 12:10
  • @madhur I have to match some text in both tables so other than that there is no relation between two tbales. – Syash A Aug 19 '18 at 12:11
  • 2
    Run EXPLAIN on the query, and edit your question to show the results. Also edit your question and show the schema (including indexes and triggers) of the two tables. – Sloan Thrasher Aug 19 '18 at 12:13
  • Are there a large number of values in colA? Or is there a limited number of values? By limited, I mean < 1000. – Sloan Thrasher Aug 19 '18 at 12:15
  • 1
    You have not provided enough information for us to help you. Please [read this note about asking good SQL questions](http://meta.stackoverflow.com/a/271056/), and pay attention to the section on query performance. Then please [edit] your question. – O. Jones Aug 19 '18 at 12:25
  • @O.Jones I executed explain on query but when i am pasting here it's not coming well, its not readable because formatting is removed when pasted here. – Syash A Aug 19 '18 at 13:49
  • Is `colA` in `tableB` unique or PK? If not - you probably don't understand what you try to do. – Paul Spiegel Aug 19 '18 at 14:17
  • @PaulSpiegel colA in tableB is unique. – Syash A Aug 19 '18 at 17:37
  • @MadhurBhaiya - `JOINing` on a `VARCHAR` is not as bad is people say. – Rick James Aug 27 '18 at 04:59
  • `EXPLAIN` is hard to format; it is better to show the ugly version than to show nothing. – Rick James Aug 27 '18 at 05:02
  • @RickJames I have had issues with Joining on varchar - even after indexing on the varchar fields. Instead of an explicit INNER JOIN, I enforce implicit JOIN by matching the values in WHERE condition. I may have something wrongly configured (despite being indexed) - what do you suggest that might be ? ready to learn :) – Madhur Bhaiya Aug 27 '18 at 05:51
  • @RickJames I have posted the EXPLAIN result now. pls see. – Syash A Aug 27 '18 at 10:33
  • When `JOINing` `VARCHARs`, the character set and collation must be the same. (Else the index will be ignored and performance will suffer.) – Rick James Aug 27 '18 at 13:20

5 Answers5

3

Your UPDATE operation is performing a single transaction on ten million rows of a large table. (The DBMS holds enough data to roll back the entire UPDATE query if it does not complete for any reason.) A transaction of that size is slow for your server to handle.

When you process entire tables, the operation can't use indexes as well as it can when it has highly selective WHERE clauses.

A few things to try:

1) Don't update rows unless they need it. Skip the rows that already have the correct value. If most rows already have the correct value this will make your update much faster.

    UPDATE tableA a 
INNER JOIN tableB b  ON a.colA=b.colA
       SET a.colB = b.colB
     WHERE a.colB <> b.colB 

2) Do the update in chunks of a few thousand rows, and repeat the update operation until the whole table is updated. I guess tableA contains an id column. You can use it to organize the chunks of rows to update.

    UPDATE tableA a 
INNER JOIN tableB b  ON a.colA=b.colA
       SET a.colB = b.colB
     WHERE a.id IN  (
             SELECT a.id
               FROM tableA
              INNER JOIN tableB ON a.colA = b.colA
              WHERE a.colB <> b.colB
              LIMIT 5000
      ) 

The subquery finds the id values of 5000 rows that haven't yet been updated, and the UPDATE query updates them. Repeat this query until it changes no rows, and you're done. This makes things faster because the server must only handle smaller transactions.

3) Don't do the update at all. Instead, whenever you need to retrieve your colB value, simply join to tableB in your select query.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • For chunks you better use a primary key range: `where a.id between 1 and 10000` - then `where a.id between 10001 and 20000` and so on. – Paul Spiegel Aug 19 '18 at 14:13
  • I tried second solution, and i executed it for only 60000 records in tableA with a chunk of 5000 records. First it gives error that "this version of mariadb doesn't yet support 'limit & in/all/any/some subquery". So I changed the query to - UPDATE tableA a INNER JOIN tableB b ON a.colA=b.ColA INNER JOIN ( SELECT id FROM tableA a INNER JOIN tableB b ON a.colA = b.colA WHERE a.colB<>b.colB LIMIT 5000) AS c ON a.id=c.id SET a.colB = b.colB – Syash A Aug 19 '18 at 17:46
  • it tooks one hour seven minutes. – Syash A Aug 19 '18 at 17:47
0

Chunking is the right way to go. However, chunk on the PRIMARY KEY of tableA.

I suggest only 1000 rows at a time.

Follow the tips given here

Did you say that the PK of tableA is a varchar? No problem. See the second flavor of code in that link; it uses ORDER BY id LIMIT 1000,1 to find the end of the next chunk, regardless of the datatype of id (the PK).

Rick James
  • 135,179
  • 13
  • 127
  • 222
-1

Hi i am not sure but you can do by cron job. process: in table tableA you need to add one more field (for example) is_update set its default value is 0, set the cron job every min. when cron is working: for example it pick first time 10000 record having is_update field 0 value and update records and set is_update is1, in 2nd time its pick next 10000 have is_update 0 and so on... Hope this will help to you.

arun11june
  • 19
  • 2
  • `cron` is not as good as simply having a single problem with a "sleep" between interations. One potential issue with cron occurs when one instance accidentally runs into the next instance. – Rick James Aug 27 '18 at 05:09
-1

For updating around 70 million records of a single MySQL table, I wrote a stored procedure to update the table in chunks of 5000. Took approximately 3 hours to complete.

DELIMITER $$
DROP PROCEDURE IF EXISTS update_multiple_example_proc$$
CREATE PROCEDURE update_multiple_example_proc()
BEGIN
DECLARE x  bigint;

SET x = 1;

WHILE x  <= <MAX_PRIMARY_KEY_TO_REACH> DO
UPDATE tableA A
   JOIN tableB B
   ON A.col1 = B.col1
SET A.col2_to_be_updated = B.col2_to_be_updated where A.id between x and x+5000 ;
SET  x = x + 5000;
END WHILE;

END$$
DELIMITER ;
karthi190
  • 61
  • 4
-1

Look at oak-chunk-update tool. It is one of the best tool if you want to update billion of rows too ;)

  • This does not provide an answer to the question. Once you have sufficient [reputation](https://stackoverflow.com/help/whats-reputation) you will be able to [comment on any post](https://stackoverflow.com/help/privileges/comment); instead, [provide answers that don't require clarification from the asker](https://meta.stackexchange.com/questions/214173/why-do-i-need-50-reputation-to-comment-what-can-i-do-instead). - [From Review](/review/late-answers/33475182) – コリン Dec 25 '22 at 14:43