We have a system that given a batch of requests, makes an equivalent number of calls to an external 3rd Party API. Given that this is an I/O bound task, we currently use a cached thread-pool of size 20 to service these requests. Other than above, is the solution to:
Use fewer machines with more cores (less context-switching, capable of supporting more concurrent threads)
or
Use more machines by leveraging commodity/cheap hardware (pizza boxes)
The number of requests we receive a day is on the order of millions.
We're using Java, so the threads here are kernel, not "green".
Other Points/Thoughts:
- Hadoop is commonly used for problems of this nature, but this needs to be real-time vs. the stereotypical offline data mining.
- The API requests take anywhere from 200ms to 2 seconds on average
- There is no shared state between requests
- The 3rd Party in question is capable of servicing more requests than we can possibly fire (payments vendor).