5

We have a system that given a batch of requests, makes an equivalent number of calls to an external 3rd Party API. Given that this is an I/O bound task, we currently use a cached thread-pool of size 20 to service these requests. Other than above, is the solution to:

Use fewer machines with more cores (less context-switching, capable of supporting more concurrent threads)

or

Use more machines by leveraging commodity/cheap hardware (pizza boxes)

The number of requests we receive a day is on the order of millions.

We're using Java, so the threads here are kernel, not "green".

Other Points/Thoughts:

  • Hadoop is commonly used for problems of this nature, but this needs to be real-time vs. the stereotypical offline data mining.
  • The API requests take anywhere from 200ms to 2 seconds on average
  • There is no shared state between requests
  • The 3rd Party in question is capable of servicing more requests than we can possibly fire (payments vendor).
smonky
  • 51
  • 2
  • Do you have shared state, used to handle requests? If so, how frequently it is changing? What is a size of this shared state? – David Gruzman Sep 12 '11 at 10:05
  • 1
    What's the limit on the 3rd party api? There's no point scaling your stack if the API you call is still the bottleneck. Can you cache the data you receive from it or use the data from one call the service/supply many of your clients simultaneously? – Paolo Sep 12 '11 at 20:19
  • Edited my original post to answer the questions above. The calls are completely independent, so there is no data to cache. – smonky Sep 13 '11 at 05:58

2 Answers2

1

It's not obvious to me that you need more resources at all (larger machines or more machines). If you're talking about at most 10 million requests in a day taking at most 2 seconds each, that means:

  • ~110 requests per second. That's not so fast. Are the requests particularly large? Or are there big bursts? Are you doing heavy processing besides dispatching to the third-party API? You haven't given me any information so far that leads me to believe it's not possible to run your whole service on a single core. (Call it three of the smallest possible machines if you want to have n+2 redundancy.)
  • on average, ~220 active requests. Again, that seems like no problem for a single machine, even with a (pooled) thread-per-request model. Why don't you just expand your pool size and call it a day? Are these really bursty? (And do you have really tight latency/reliability requirements?) Do they need a huge amount of RAM while active?

Could you give some more information on why you think you have to make this choice?

Scott Lamb
  • 2,266
  • 1
  • 19
  • 21
0

Rather than using a large number of threads, you might fare better with the event-driven I/O using node.js with the caveats that it may mean a large rewrite and the fact that node.js is fairly young.

This SO article may be of interest.

Community
  • 1
  • 1
beny23
  • 34,390
  • 5
  • 82
  • 85