1

can you explain this nonsense to me? i have a method that basically fills up an array with mathematical operations. there's no I/O involved or anything. now, this method takes about 50 seconds to run, and the code is perfectly scalable (theoretically 100%), so i split it up into 4 threads, wait for them to complete, and reassemble the 4 arrays. now, i run the program on a quad core processor, expecting it to take about 15 seconds, and it actually takes 58 seconds. that's right: it takes longer! i see the cpu working 100%, and i know that each thread does 1/4 of the calculations, and creating threads and reassembling the arrays take about 1-2 ms in total. what's causing such loss of performance? what the hell is the cpu doing all that time? CODE: http://pastebin.com/cFUgiysw

  • How have you defined your arrays? – Morrison Chang Aug 18 '12 at 08:18
  • What methods do you call? any synchronized method accessed by the threads can be a bottleneck. Also without some code we can only guess. – josefx Aug 18 '12 at 08:28
  • no synchronized methods. array defined this way: int[] array=new int[w*h]; with w and h being 2 ints representing width and height (no i can't use a 2d array beacaus that data has to be drawn in a canvas) – user1608450 Aug 18 '12 at 08:44
  • Show some code, I'm pretty sure you called run instead of start. – Thomas Jungblut Aug 18 '12 at 08:46
  • nope, i called start. for the code, wait a second – user1608450 Aug 18 '12 at 08:48
  • http://pastebin.com/cFUgiysw here's the code for the multithreaded rendering. it's kind of messy as i said. hope you understand it – user1608450 Aug 18 '12 at 08:49
  • it's a class that does a bunch of calculations, basically a RNG for procedural terrain height map generation i'm making – user1608450 Aug 18 '12 at 08:56
  • 1
    The problem must be in this class, if I run it with some random computation it is 4x faster than the sequential computation. Can you provide the source of `Randomatic`? – Thomas Jungblut Aug 18 '12 at 09:02
  • 1
    oh crap it could be! randomatic uses class vector, it has synchronized methods. i didn't think of that :D let me try to fix it, then i'll post the code – user1608450 Aug 18 '12 at 09:05
  • 3
    jesus christ it was the vector. now it's fast as hell THANKS A LOT! – user1608450 Aug 18 '12 at 09:08
  • @ThomasJungblut I would post your comment as an answer, and then user1608450 can accept it. That way you both get points, and user1608450's acceptance rating goes up. – yshavit Aug 18 '12 at 09:18

4 Answers4

0

Threads don't work that way.

Threads are still part of the same process (depending on the OS), so in terms of the operating system - CPU time will be scheduled the same for 4 threads in 1 process as it is for 1 thread in 1 process.

Also, with such a small number of values, you won't see the scalability in the midst of the overhead. Re-assembling the arrays in java will be costly.

Check out things like "Context switching overhead" - things like that always mess you up when you try to map theory to practise :P

I would stick to the single-threaded way :)

~ Dan

http://en.wikipedia.org/wiki/Context_switch

ddoor
  • 5,819
  • 9
  • 34
  • 41
  • if what you're saying is correct, then i'd see a 25% cpu usage, not 100%. also, this is the first time i have this problem with threads. – user1608450 Aug 18 '12 at 08:28
  • This is only relevant if a different process tries to use 100% CPU at the same time. – josefx Aug 18 '12 at 08:29
0

There is a cost associated with opening new threads. I don't think it should be up to 8 second but it depends on what threads you are using. Some threads needs to create a copy of the data that you are handling to be thread safe and that can take some time. This cost is commonly referred to as overhead. If the execution you are doing is somewhere not serializable for instance reads the same file or needs access to a shared resource the threads might need to wait on each other this can take some time and under sub optimal conditions it can take more time than serial execution. My tip is try and check for these unserializable events remove them from the threaded part if possible. Also try and use a lower amount of threads 4 threads for 4 cpus is not always optimal.

Hope it helps.

Pablo Jomer
  • 9,870
  • 11
  • 54
  • 102
  • Okay. Well im just learning java so that is at least good for or friends I will remove that part of my answer. – Pablo Jomer Aug 18 '12 at 08:31
  • 1
    thanks, as josefx said, java uses native threads. creating them takes less than a ms (i measured it with timestamps), and also reassembling the 4 arrays (about 1 million elements each) takes less than 2 ms. since the calculation is so long i doubt it's a problem with overhead (15 seconds in theory to 58 is a huge difference). also, there are no variables being used in common, except for an instance of a small object, which is not modified by the threads. – user1608450 Aug 18 '12 at 08:32
  • Well then im prety much out of ideas. Post some of the code maby we can spot somthing? – Pablo Jomer Aug 18 '12 at 08:34
  • I checked your code. It doesn't take long for java to launch a new thread. But it does take time for you to copy all those values across in your constructor doesn't it? – ddoor Aug 18 '12 at 09:50
0

A lot depends on what you are doing and how you are dividing the work. There are many possible causes for this problem.

  • The most likely cause is, you are using all the bandwidth of your CPU to main memory bus with one thread. This can happen if your data set is larger than your CPU cache. esp if you have some random access behaviour. You could consider trying to reuse the original array, rather than taking multiple copies to reduce cache churn.
  • Your locking overhead is greater than the performance gain. I suspect you have used very course locking so this shouldnt be an issue.
  • Starting stopping threads takes too long. As your code is multi second, I doubt this too.
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • i'm afraid it could the first one: the arrays are kind of big, and java is a damn resource hog. however, these are mostly sequential accessess – user1608450 Aug 18 '12 at 08:35
  • An array of double `double[]` uses no more in Java than any other language. If you are using `List` it is a resource hogs and can impact performance so I wouldn't suggest you don't use that. – Peter Lawrey Aug 20 '12 at 18:59
0

Unless you are constantly creating and killing threads the thread overhead shouldn't be a problem. Four threads running simultaeously is no big deal for the scheduler.

As Peter Lawrey suggested the memory bandwidth could be the problem. Your 50-second code is running on a java engine and they both compete for the available memory bandwidth. The java engine needs memory bandwidth to execute your code and your code needs it to do its calculations.

You write "perfectly scalable" which would be the case if your code was compiled. Since it runs on a java engine this is not the case. So the 16% increase in overall time could be seen as the difference between the smoothness of one thread vs the chaos of four colliding over memory accesses.

Olof Forshell
  • 3,169
  • 22
  • 28