1

I have a service, where I need to ask 40 external services (API's) to get information from them, by each user request. For example one user is searching for some information and my service is asking 40 external partners to get the information, aggregates it in one DB (mysql) and displays the result to the user.

At this moment I have a multicurl solution, where I have 10 partner request at one time and if someone parnter is done with the request, then the software is adding another partner from the remaining 30 to the queue of multicurl, until all the 40 request are done and the results are in the DB.

The problem on this solution, is that it can not scale on many servers and I want to have some solution, where I can fire 40 request at one time for example divided on 2-3 servers and wait only so long, as the slowest partner delivers the results ;-) What means, that if the slowest partner tooks 10 seconds I will have the result of all 40 partners in 10 seconds. On multicurl I come in troubles, when there are more then 10-12 requests at one time.

What kind of solution, can you offer me, what i getting as low as possible ressources and can run many many process on one server and be scalable. My software is on PHP written, that mean I need an good connect to the solution with framework or API.

I hope you understand my problem and need. Please ask, if something is not clear.

user229044
  • 232,980
  • 40
  • 330
  • 338
Mutatos
  • 1,675
  • 4
  • 25
  • 55
  • you could write books on this, its to broad for S.O –  Jul 11 '12 at 21:34
  • Well not books, but I know its complicated :-) Maybe there are some hints for this, how can I scale it on many servers? – Mutatos Jul 11 '12 at 21:50
  • Why are you limited to 10-12 requests at a time with multi-curl? I've done 200 requests at a time with multi-curl on a low end machine. You should be able to do 40 without a problem, unless there is a restriction on your server. – Brent Baisley Jul 12 '12 at 01:20

1 Answers1

1

One possible solution would be to use a message queue system like beanstalkd, Apache ActiveMQ, memcacheQ etc.

A high level example would be:

  • User makes request to your service for information
  • Your service adds the requests to the queue (presumably one for each of the 40 services you want to query)
  • One or more job servers continuously poll the queue for work
  • A job server gets a message from the queue to do some work, adds the data to the DB and deletes the item from the queue.

In this model, since now the one task of performing 40 requests is distributed and is no longer part of one "process", the next part of the puzzle will be figuring out how to mark a set of work as completed. This part may not be that difficult or maybe it introduces a new challenge (depends on the data and your application). Perhaps you could use another cache/db row to set a counter to the number of jobs a particular request needs in order to complete and as each queue worker finishes a request, it can reduce the counter by 1. Once the counter is 0, you know the request has been completed. But when you do that you need to make sure the counter gets to 0 and doesn't get stuck for some reason.

That's one way at least, hope that helps you a little or opens the door for more ideas.

drew010
  • 68,777
  • 11
  • 134
  • 162
  • Well the problem what I see in this system model is, that the queue could be sometimes full and there will be no process finished, because the worker will all work in the same time. For this solution to prevent this situation I need to have ever many workers started and listening ... and this are many ressources lost, even there are not so much request for all the workers. – Mutatos Jul 12 '12 at 10:21
  • I don't see that as a problem. You could have a worker grab 10 items from the queue and process them at once or have many workers handling one job at a time. You could start the workers by running a cron job every 15 seconds or have a number of workers on each server run in a loop looking for new work and then sleeping for a short time if there is nothing in the queue. And the queue can't really fill up, its pretty much unlimited in size (based on your disk storage). The first items added to the queue are the first ones to get processed as well. You just need to make sure you can keep up. – drew010 Jul 12 '12 at 16:48