0

Words are hard today, and I'm trying to write this up clearly so that I can help you help me. Some overkill exists below, ymmv.

I'm migrating about 48m records out of table1 into table2, both InnoDB. I'm able to identify the records to be culled from table1 via a user_id ("any record entered by user_id=3 moves out of table1 and into table2"). I've got a first pass at working code in a single instance setup that I need to extend to be able to run the migration process via an AMQP server so that I can run the job via multiple consumers (ie I'm running the consumers/jobs in parallel in 2 instances of the same job at the same time) in a round robin setup where the consumers are all running at the same time. I'd like to be able to scale up to 10ish consumers to knock the job out quickly, else its about 15 days for a single instance to complete the job.

The code selects the oldest 2000 records that match the query criteria per run. The problem is that if I have 2 instances of the code running and selecting records then instance1 is going to grab id's 1-2000 and instance2 that is started at nearly exactly the same time is also going to grab 1-2000 since instance1 hasn't finished moving 0-2000 out of table1 and those id's will still be present in table1.

Ok, no problem. I add a control table to note which records to exclude - that allows the instance1 to say "I have records 1-2000" so that when instance2 fires up and goes to query it says "what id's should I exclude?" based on querying the control table as part of its "give me the next 2000 records" so that instance2 would grab id's 2001 to 4001 and then update the control table to let it know that instance2 has ids 2001 to 4001. So if there is an instance3 then it would know to omit id's 1 to 4001 when it fires up asking for the next 2k records.

When an instance writes an entry into the locked control table it writes it as "[first_id_found] and [last_id_found]", or "5671 and 7671". I use the ' and ' since the text goes into a sql BETWEEN clause.

select id from table1 where id not between 1565 and 1567 and id not between 1568 and 1570 and created_by=3 order by id asc limit 0, 2000

The problem:

The only table I need a lock on is the control table so that I'm assured that once an instance has access to the control table that it will be able to both find the records to omit as well as write its own omit record so that the next instance that accesses the control table once the lock is released will find up to date data within.

But I need to leave the lock in place until the query of table1 finishes and I can determine first and last id in result set so that I can add these to the control table so that the next instance fired up will know what to exclude.

Note that the control file is neither table1 nor table2, and that I don't specify table1 or table2 since I don't need a lock on either of those, nor do I want one because other processes are accessing the tables and I don't want to impede access from the other running processes to table1 or table2.

so the psuedo code looks like this:

// lock control table
// find all control table records (multiple workers = possible multiple records) indicating which id ranges to omit
// build "table1.id not between x and y" strings for each control record found
// query table1 with "not between" into array via PDO::fetchAll() to get the ids of N records that no other instance is working with  <<-- the problem
// find first and last ids in result set
// add control table record to `params` field with value = "[first_id_found] and [last_id_found]"
// unlock control table
// process array

When I query table1 I get no records returned. Not correct, I've got 48m records. Checking errorInfo() shows me an error condition with:

"Table 'table1' was not locked with LOCK TABLES"

I don't want to lock table1, yet it looks like because its been queried after a LOCK TABLES has been put into place on control and before an UNLOCK TABLES that I'm in some sort of transaction mode where all tables to be used have to have a lock level assigned. WTF, I see nothing about this in the docs.

Why am I getting this problem with table1 when it isn't within the locking scope? I'm not married to this strategy, its just what came to mind, if there is a different pattern that will yield the same functionality I'm game. Workarounds? Suggestions?

MikeD187
  • 1
  • 1
  • Perhaps a different approach: create a job controller that spawns sub jobs to do the work. The controller will assign the blocks of ids to be processed to the subs. When a sub completes its task it reports back to the controller job which assign it a new block – dlporter98 Aug 08 '15 at 15:29
  • Also, you should use the IN function instead of Between And, the former performs magnitudes better than the later because it can make use of the index – dlporter98 Aug 08 '15 at 15:37

0 Answers0