0

Few months ago i asked this about the implementation of my api for processing files it uses PHP, a command line script that is called via PHP and queue. For the queue i am using beanstalkd

The API accepts one file or group of files (up to 5) per request. Processing one file takes 1-3 seconds depending of the size.

My question now is whatever will be better, to put every file of the request to a separate job or all the files in one job? My function for processing that is slow accepts one or multiple files. My guess is that i put the all the files of the request on processing, they will be processed by one worker. But if i put every file into separate background job it will be probably processed by own worker so 4 files 4 workers - that is what i think. Not sure if this is correct.

So if my above conclusion is correct, is it better for a lot of requests to process all files in once or add them separate worker?

Thank you.

gdarko
  • 175
  • 8
  • 1
    Please explain what you mean by "better"; this question could be considered "too broad" or "primarily opinion based" and as such be flagged. – Fred Gandt Aug 03 '18 at 11:19
  • By better i mean the platform to handle more users at a time – gdarko Aug 03 '18 at 12:03

1 Answers1

0

To handle more users, or more throughput in the same second you need to ensure multiple things:

  • have more than 1 worker, usually scale up the size of the worker to 10 from start
  • this way you have 10 parallel workers
  • put 10 different messages into the queue so each worker pickup a job to tackle
  • monitor queue and if more jobs keep accumulating add more workers
  • monitor machine CPU and Ram state and if starts to throttle around 80% of CPU you should consider adding another machine that consumes jobs from the same queue
  • you could have different machines for different needs (SSD for fast IO, high end CPU for quick jobs, lower machines for transactional states etc..)
Pentium10
  • 204,586
  • 122
  • 423
  • 502