3

I am working on a program that receives search requests for a topic, makes API calls to the New York Times API to fetch articles related to the topic, and then to the Twitter API to fetch tweets mentioning the articles and finally processes the results and returns it back.

I have to make this multi-threaded. I thought about using an ExecutorService with a fixed-sized thread pool. So, every incoming search request will handled by a separate thread. I also use the Callable interface to submit tasks. The class that implements the Callable does the API processing(making & receiving API requests/responses). Finally the result is then fetched by a Future and displayed as the output. This happens for every incoming request.

Does this make sense? Or is there a better way to do this?

EDIT: I am running this on my local machine accepting data from the command line interface.

gofeddy
  • 579
  • 8
  • 20

2 Answers2

5

If this is a web application, it is multi-threaded by default. If it's not - you still can deploy it on a servlet container, it would be beneficial. The thread pool is supplied by the underlying container (tomcat, for example). Each request is serviced by a separate thread.

The only things to care about:

  • do not use synchronized
  • cleanup any ThreadLocal variables that you use
Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
  • This is not a web application. I am running it on my local machine accepting data from the CLI. But I still want to make it multithreaded. – gofeddy Feb 28 '12 at 15:12
  • 1
    Why don't you run it on a servlet container? – Bozho Feb 28 '12 at 15:28
  • I guess I will consider this option eventually. Before that, I was trying to see if I can get something implemented via the java.util.concurrent feature set. – gofeddy Feb 28 '12 at 15:43
2

I would focus on getting the workflow correct, and then profiling see where the bottlenecks are and then trying to see where concurrency ( threading != concurrency or asynchronous execution ) might help you. Saturating your CPU, Network or Disk I/O with multiple threads of execution won't make things faster, and usually hurts performance, especially on hyperthreaded Intel CPUs.

Then I would worry more about making it non-blocking and asynchronous before making it multi-threaded. Blocking tasks ( serialized ) completely negate any benefits of attempts at using threads to make things concurrent.

Multi-threaded does not magically mean it will run faster or more efficiently if the tasks are still serialized in the workflow. To the contrary, it might even make things slower and less efficient if you don't have the message passing and async stuff right before hand.

Also if you are running this on a top of the line Core i7 laptop, you are only going to get 4 real threads ( the 4 hyperthreads usually make things worse on CPU bound apps ) and the over head of trying to make things happen out of serial order and then putting them back might not get you any real gains and lots of complexity. On a many more core server this might not be the case, on a laptop threading isn't going to get you much.

"Doing concurrency is easy, doing concurrency correctly is very hard!" - paraphrasing my Aikido Sensei

  • Wonder whether using [Node.JS](http://nodejs.org/) would be an idea for a simple asynchronous solution... – beny23 Feb 28 '12 at 15:25
  • I guess it doesn't matter much overall. I tried sending 10-15 requests with different sized thread pools each time, but the response time never showed any improvement/changes. Also, right now, I can see that my use of Futures to get results back after submitting tasks does make this application blocking. I'll look into it to see if I can make it non-blocking. – gofeddy Feb 28 '12 at 15:31
  • @Jarrod Roberson: *the 4 hyperthreads usually make things worse on CPU bound apps* [sic]... And *"on a laptop threading isn't going to get you much"* [sic]... Regarding the first quote, links would be much welcome. Regarding the second quote, I've repeatedly witnessed the exact opposite: using a *producer/consumers* scheme to crunch data, I get better throughput when adapting the number of consumer threads to the number of CPUs (already noticeable on old Core 2 Duo Mac laptops, for example). Are you really advocating single-threaded producer/consumer in this day and age of multi-core CPUs? – TacticalCoder Feb 28 '12 at 16:04
  • not advocating single threaded producer/consumer, I am saying that incorrect attempts at throwing threads at problems causes more problems that are much harder to solve. Core iX processors don't degrade gracefully under high CPU load. Example: Compiling software without using the hyperthreaded threads is faster than with them. On an i3 setting threads = 2 or i7 threads = 4 is about 25% faster than including all the "threads" when compiling. But without a proper async / non-blocking design threading is a moot point, because it isn't going to be concurrent anyway. –  Feb 28 '12 at 16:09
  • @TacticalCoder you are arguing with yourself, I have never said any of the things you are arguing about. But to your point, if I have a serialized process and workflow using more than 1 thread will have no effect to a negative effect and never a positive effect. Compiling on an i7 with threads = 8 is much much slower than with threads = 4. The context switching and cache misses are very degenerate, especially on Windows! –  Feb 28 '12 at 19:11
  • @TacticalCoder your erroneous interpretation *""threads = 1 usually hurts performance""* you wrote that **I didn't** of something I didn't say, taken out of context, read for comprehension. I said, in context, that 4 threads can be faster, by 25% or more than 8 threads on a hyperthreaded CPU with an app that is CPU bound, saturating the CPU with 8 thread contexts can be bad, plain and simple. **You read into it what ever you want**, but I didn't say any of the things **you are interpreting**, plain and simple. –  Feb 28 '12 at 22:21
  • Just in case someone misinterprets the above (just like I did): don't go thinking that because multithreaded programming can be hard, it's not worth it and certainly don't go thinking that using more than one thread will slow your app (I still think that the above answer implies just that the way it is worded). For example if you want to compile a big software on a Core i7, doing it using only one thread on the i7 will be terribly slow. Doing it using four threads will bring a 400% speedup. – TacticalCoder Feb 29 '12 at 11:57