0

I'm in troubles with a multithreading java program. The program consists of a splitted sum of an array of integers with multithreads and than the total sum of the slices. The problem is that computing time does not decrements by incrementing number of threads (I know that there is a limit number of threads after that the computing time is slower than less threads). I expect to see a decrease of execution time before that limit number of threads (benefits of parallel execution). I use the variable fake in run method to make time "readable".

public class MainClass {

private final int MAX_THREAD = 8;
private final int ARRAY_SIZE = 1000000;

private  int[] array;
private SimpleThread[] threads;
private int numThread = 1;
private int[] sum;
private int start = 0;
private int totalSum = 0;
long begin, end;
int fake;


MainClass() {
    fillArray();

    for(int i = 0; i < MAX_THREAD; i++) {
        threads = new SimpleThread[numThread];
        sum = new int[numThread];

        begin = (long) System.currentTimeMillis();

        for(int j = 0 ; j < numThread; j++) {
            threads[j] = new SimpleThread(start, ARRAY_SIZE/numThread, j);
            threads[j].start();
            start+= ARRAY_SIZE/numThread;
        }



        for(int k = 0; k < numThread; k++) {
            try {
                threads[k].join();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }
        }


        end = (long) System.currentTimeMillis();


        for(int g = 0; g < numThread; g++) {
            totalSum+=sum[g];
        }


        System.out.printf("Result with %d thread-- Sum = %d Time = %d\n", numThread, totalSum, end-begin);
        numThread++;
        start = 0;
        totalSum = 0;
    }

}


public static void main(String args[]) {
    new MainClass();
}


private void fillArray() {
    array = new int[ARRAY_SIZE];
    for(int i = 0; i < ARRAY_SIZE; i++) 
        array[i] = 1;
}


private class SimpleThread extends Thread{
    int start;
    int size;
    int index;

    public SimpleThread(int start, int size, int sumIndex) {
        this.start = start;
        this.size = size;
        this.index = sumIndex;
    }

    public void run() {
        for(int i = start; i < start+size; i++) 
            sum[index]+=array[i];

        for(long i = 0; i < 1000000000; i++) {
            fake++;
        }
    }
}

Unexpected Result Screenshot

  • `ARRAY_SIZE/numThread` may have fractional part which gets rounded down so `start` variable loses some hence the sum maybe less than `1000000` depending on the value of divisor. – Griffin Oct 04 '17 at 10:56
  • Not looking at the details but consider using [ForkJoinPool](https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html) for this type of operation if you're on Java 7+. Will save you some low level headaches. – Mena Oct 04 '17 at 10:58

4 Answers4

0

Starting threads is heavy and you'll only see the benefit of it on large processes that don't compete for the same resources (none of it applies here).

Andres
  • 10,561
  • 4
  • 45
  • 63
0

Why sum is wrong sometimes?

Because ARRAY_SIZE/numThread may have fractional part (e.g. 1000000/3=333333.3333333333) which gets rounded down so start variable loses some hence the sum maybe less than 1000000 depending on the value of divisor.

Why the time taken is increasing as the number of threads increases?

Because in the run function of each thread you do this:

for(long i = 0; i < 1000000000; i++) {
    fake++;
}

which I do not understand from your question :

I use the variable fake in run method to make time "readable".

what that means. But every thread needs to increment your fake variable 1000000000 times.

Griffin
  • 716
  • 7
  • 25
  • I assume OP is using the `fake` variable to pad out the run time of the thread, because otherwise they complete too quickly for comparisons to be drawn at millis resolution. – Karl Reid Oct 04 '17 at 11:08
  • i use the fake variable to make the run method last more, so that i can take trace of a human readable time. if i remove the fake variable the duration of run method is too short and it gives me execution time of 0 – Francesco Califano Oct 04 '17 at 11:11
0

As a general rule, you won't get a speedup from multi-threading if the "work" performed by each thread is less than the overheads of using the threads.

One of the overheads is the cost of starting a new thread. This is surprisingly high. Each time you start a thread the JVM needs to perform syscalls to allocate the thread stack memory segment and the "red zone" memory segment, and initialize them. (The default thread stack size is typically 500KB or 1MB.) Then there are further syscalls to create the native thread and schedule it.

In this example, you have 1,000,000 elements to sum and you divide this work among N threads. As N increases, the amount of work performed by each thread decreases.

It is not hard to see that the time taken to sum 1,000,000 elements is going to be less than the time needed to start 4 threads ... just based on counting the memory read and write operations. Then you need to take into account that the child threads are created one at a time by the parent thread.

If you do the analysis completely, it is clear that there is a point at which adding more threads actually slows down the computation even if you have enough to cores to run all threads in parallel. And your benchmarking seems to suggest1 that that point is around about 2 threads.


By the way, there is a second reason why you may not get as much speedup as you expect for a benchmark like this one. The "work" that each thread is doing is basically scanning a large array. Reading and writing arrays will generate requests to the memory system. Ideally, these requests will be satisfied by the (fast) on-chip memory caches. However, if you try to read / write an array that is larger than the memory cache, then many / most of those requests turn into (slow) main memory requests. Worse still, if you have N cores all doing this then you can find that the number of main memory requests is too much for the memory system to keep up .... and the threads slow down.


The bottom line is that multi-threading does not automatically make an application faster, and it certainly won't if you do it the wrong way.

In your example:

  • the amount of work per thread is too small compared with the overheads of creating and starting threads, and
  • memory bandwidth effects are likely to be a problem if can "factor out" the thread creation overheads

1 - I don't understand the point of the "fake" computation. It probably invalidates the benchmark, though it is possible that the JIT compiler optimizes it away.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • what do you mean for "red zone" in thread creation? i studied that threads share code, files and datas with the process they belong to. I tried to increase array's size and it now works!, i have a dual core cpu with 4 threads, and i see that the computation is faster until 3 threads (i think because main method is itself a thread, so 3 threads created by main plus the main itself). – Francesco Califano Oct 04 '17 at 12:29
  • Read this to understand what a red zone is: https://docs.oracle.com/cd/E19455-01/806-5257/attrib-33670/index.html – Stephen C Oct 04 '17 at 13:55
0

As a side note, for what you're trying to do there is the Fork/Join-Framework. It allows you easily split tasks recursively and implements an algorithm which will distribute your workload automatically.

There is a guide available here; it's example is very similar to your case, which boils down to a RecursiveTask like this:

class Adder extends RecursiveTask<Integer>
{
    private int[] toAdd;
    private int from;
    private int to;

    /** Add the numbers in the given array */
    public Adder(int[] toAdd)
    {
        this(toAdd, 0, toAdd.length);
    }

    /** Add the numbers in the given array between the given indices;
        internal constructor to split work */
    private Adder(int[] toAdd, int fromIndex, int upToIndex)
    {
        this.toAdd = toAdd;
        this.from = fromIndex;
        this.to = upToIndex;
    }

    /** This is the work method */
    @Override
    protected Integer compute()
    {
        int amount = to - from;
        int result = 0;
        if (amount < 500)
        {
            // base case: add ints and return the result
            for (int i = from; i < to; i++)
            {
                result += toAdd[i];
            }
        }
        else
        {
            // array too large: split it into two parts and distribute the actual adding
            int newEndIndex = from + (amount / 2);
            Collection<Adder> invokeAll = invokeAll(Arrays.asList(
                    new Adder(toAdd, from, newEndIndex),
                    new Adder(toAdd, newEndIndex, to)));
            for (Adder a : invokeAll)
            {
                result += a.invoke();
            }
        }
        return result;
    }
}

To actually run this, you can use

RecursiveTask adder = new Adder(fillArray(ARRAY_LENGTH));
int result = ForkJoinPool.commonPool().invoke(adder);
daniu
  • 14,137
  • 4
  • 32
  • 53