How can I scale a mysql database with high I/O used for queue jobs with 300 workers?

Question

I'm currently using angular on the frontend for a dashboard that displays the progress of queue workers. On the backend, I use 50 AWS EC2 instances that each have around 6 workers and those are managed by supervisord. Those workers take the next job available in an Aurora Mysql Serverless DB. Those workers write in the DB a lot and also read a lot from it and what I've noticed is that the read operations are slown down in the frontend. For example when executing an axios get request to get the job progress, it sometimes takes 50 secs to receive the information and all subsequent ajax calls are delayed.

I tried to scale the DB to a better instance but I have a feeling this is not the right approach because there is still a lot of lag. I think this is probably because mysql is not the right tool for the job and redis would be better suited? One solution that I also thought about would be to have a read replica so that the frontend dashboard reads from it and receives the information in milliseconds. The downfall with this approach is that I'm using an api on the backend so I would have to replicate the api on another instance and there would be two similars api, including one that is read only.

What do you guys think would be the best approach?

Typically "next job" type situations are handled using a queue, such as Amazon SQS / Simple Queue Service. DynamoDB seems like a more appropriate data store, if you need a database. — Tim, Apr 02 '19 at 18:30
It is very likely that the queries can be sped up -- by suitable indexing (esp, composite indexes) and/or reformulation of `SELECTs`. Show us some of the typical or slow queries. Meanwhile, "not enough info". — Rick James, Apr 19 '19 at 18:44
How many "jobs" per second are inserted? How many seconds does it take for a "job" to be performed? Don't use queuing if the latter is not significantly bigger than the former -- "Don't queue it, just do it". A related question -- Are there bursts of jobs? If so, describe the burstiness. — Rick James, Apr 19 '19 at 18:46

score 0 · Accepted Answer · answered Apr 02 '19 at 23:43

Did you check what queries actually cause the most stress on the DB? Are they the ones that check the jobs progress? If that's the case you may want to run these only every few seconds and cache the results in redis or memcache (e.g. AWS ElastiCache). Dashboard progress updates usually don't need to be 100% accurate and 100% immediate.

Aurora also supports read-only nodes - perhaps you can read your updates from these R/O nodes and not overload the master R/W node with read only stuff.

Also make sure you've got the correct indexes on the database tables. You can use EXPLAIN SELECT ... with your most common or longest taking queries to make sure they don't scan the tables. Indexes can make a huge difference!

And as Tim said managing a job queue is much better done with AWS SQS - workers pick up the jobs from the queue one by one and don't have to scan the DB for the next one.

Hope that helps :)

Thanks for the answer, this really helped me to resolve the issue! — Mike, Apr 03 '19 at 20:11

How can I scale a mysql database with high I/O used for queue jobs with 300 workers?

1 Answers1