I ran into an issue in my Java web application today and need a 2nd (and 3rd, and 4th) set of eyeballs to see where I've messed up and how best to fix it.
The web app collects bids each round of an auction, and then fills in missing bids for bidders at the end of the round. I run the solving portion in parallel to speed it up. There are approximately 50 bidders bidding on 30 products.
Here's the pseudo-code of the application mixed with the actual Java code...
public void generateResults(final int round)
{
// pseudo-code to retrieve all the bids, takes about 800ms
final Bids bids = DB.getBids();
final Users bidders = UserService.lookup(UserData.BIDDER);
ExecutorService exec =
Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors() + 1);
Collection<Callable<BidSet>> tasks = new ArrayList<Callable<BidSet>>();
for (int i=0; i<bidders.size(); i++)
{
tasks.add(new Callable<BidSet>()
{
public Bids call()
{
User bidder = Users.get(bidders.get(i));
Bids bidsForUser = bids.findByUser(bidder.id);
// do some manipulation of the bids with some DB calls
Bids missingBids = // result of manipulated bids above
Bids.store(missingBids);
}
});
}
List<Future<Bids>> results = exec.invokeAll(tasks);
}
The code runs as it should, and it writes to the DB correctly (an indexed InnoDB table). In my local tests on a 6-core machine, this algorithm will take about 3.3 seconds to complete. However, today on the server, which is a 4-core machine, this algorithm was taking 35 seconds to run. Not good.
My questions/concerns center on a few things
- Big question - Why did it take so much longer to run on the server than locally? It was crunching the exact same data. In fact, it was running against exact copies of the same DB.
- Locally, there are no users besides myself, so Tomcat is not fielding any requests from users looking to load pages and all the CPU can be devoted to crunching. Would having say 20 users load pages cause a huge pile-up on the server and lock everything down?
- Is the number of processors I use in my FixedThreadPool wrong? Is it too many? Am I locking out the other necessary resources on the machine (DB threads and Tomcat HTTP threads) from operating? What should I change it to?
- Is it too many writes to the DB concurrently? 50 bidders x 30 products would mean 1500 writes to the InnoDB table in a few seconds. flush_log_at_trx_commit=2 though.
- Could I change the expected results with different hardware? 8-core machine for example.
Based on my observation today, the CPU spiked to 100% on every core when this algorithm was operating. I don't know the page load times of the users while this algorithm crunched. Thanks.