1

The task I'm trying to implement is finding Collatz sequence for numbers in a set interval using several threads and seeing how much improvement is gained compared to one thread.

However one thread is always faster no matter if it I choose 2 threads(edit. 2 threads are faster, but not by much while 4 threads is slower than 1 thread and I have no idea why.(I could even say that the more threads the slower it gets). I hope someone can explain. Maybe I'm doing something wrong.

Below is my code that I wrote so far. I'm using ThreadPoolExecutor for executing the tasks(one task = one Collatz sequence for one number in the interval).

The Collatz class:

    public class ParallelCollatz implements Runnable {
    private long result;
    private long inputNum;

    public long getResult() {
        return result;
    }
    public void setResult(long result) {
        this.result = result;
    }
    public long getInputNum() {
        return inputNum;
    }
    public void setInputNum(long inputNum) {
        this.inputNum = inputNum;
    }
    public void run() {

        //System.out.println("number:" + inputNum);
        //System.out.println("Thread:" + Thread.currentThread().getId());
        //int j=0;
        //if(Thread.currentThread().getId()==11) {
        //  ++j;
        //  System.out.println(j);
        //}

            long result = 1;

            //main recursive computation
            while (inputNum > 1) {

                if (inputNum % 2 == 0) {
                    inputNum = inputNum / 2;
                } else {
                    inputNum = inputNum * 3 + 1;
                }
                ++result;
            }
           // try {
                //Thread.sleep(10);
            //} catch (InterruptedException e) {
                // TODO Auto-generated catch block
        //      e.printStackTrace();
            //}
            this.result=result;
            return;
        }

}

And the main class where I run the threads(yes for now I create two lists with the same numbers since after running with one thread the initial values are lost):

        ThreadPoolExecutor executor = (ThreadPoolExecutor)Executors.newFixedThreadPool(1);
    ThreadPoolExecutor executor2 = (ThreadPoolExecutor)Executors.newFixedThreadPool(4);

    List<ParallelCollatz> tasks = new ArrayList<ParallelCollatz>();
    for(int i=1; i<=1000000; i++) {
        ParallelCollatz task = new ParallelCollatz();
        task.setInputNum((long)(i+1000000));
        tasks.add(task);

    }


    long startTime = System.nanoTime();
    for(int i=0; i<1000000; i++) {
        executor.execute(tasks.get(i));

    }

    executor.shutdown();
    boolean tempFirst=false;
    try {
        tempFirst =executor.awaitTermination(5, TimeUnit.HOURS);
    } catch (InterruptedException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }
    System.out.println("tempFirst " + tempFirst);
     long endTime = System.nanoTime();
    long    durationInNano = endTime - startTime;
    long    durationInMillis = TimeUnit.NANOSECONDS.toMillis(durationInNano);  //Total execution time in nano seconds
        System.out.println("laikas " +durationInMillis);


        List<ParallelCollatz> tasks2 = new ArrayList<ParallelCollatz>();
        for(int i=1; i<=1000000; i++) {
            ParallelCollatz task = new ParallelCollatz();
            task.setInputNum((long)(i+1000000));
            tasks2.add(task);

        }


        long startTime2 = System.nanoTime();
        for(int i=0; i<1000000; i++) {
            executor2.execute(tasks2.get(i));

        }

        executor2.shutdown();
        boolean temp =false;
        try {
             temp=executor2.awaitTermination(5, TimeUnit.HOURS);
        } catch (InterruptedException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        System.out.println("temp "+ temp);
         long endTime2 = System.nanoTime();
            long durationInNano2 = endTime2 - startTime2;
            long durationInMillis2 = TimeUnit.NANOSECONDS.toMillis(durationInNano2);  //Total execution time in nano seconds
            System.out.println("laikas2 " +durationInMillis2);

For example running with one thread it completes in 3280ms. Running with two threads 3437ms. Should I be considering another concurrent structure for calculating each element?

EDIT Clarrification. I'm not trying to parallelize individual sequences, but an interval of numbers when each number has it's sequence.(Which is not related to other numbers)

EDIT2

Today I ran the program on a good PC with 6 cores and 12 logical processors and the issue persists. Does anyone have an idea where the problem might be? I also updated my code. 4 threads do worse than 2 threads for some reason.(even worse than 1 thread). I also applied what was given in the answer, but no change.

Another Edit What I have noticed that if I put a Thread.sleep(1) in my ParallelCollatz method then the performance gradually increases with the thread count. Perhaps this detail tells someone what is wrong? However no matter how many tasks I give if there is no Thread.Sleep(1) 2 threads perform fastest 1 thread is in 2nd place and others hang arround a similiar number of milliseconds but slower both than 1 and 2 threads.

New Edit I also tried putting more tasks(for cycle for calculating not 1 but 10 or 100 Collatz sequences) in the run() method of the Runnable class so that the thread itself would do more work. Unfortunately, this did not help as well. Perhaps I'm launching the tasks incorrectly? Anyone any ideas?

EDIT So it would seem that after adding more tasks to the run method fixes it a bit, but for more threads the issue still remains 8+. I still wonder is the cause of this is that it takes more time to create and run the threads than to execute the task? Or should I create a new post with this question?

JensG
  • 13,148
  • 4
  • 45
  • 55
  • I'm calculating a sequence for lets say 10000 numbers. Each number has it's sequence that I'm trying to do in parallel, those sequences are not related. The ultimate goal will be to find the longest one, but for now I'm just trying to run those sequences in separate threads. – Svajunas Kavaliauskas Apr 27 '19 at 20:54
  • are you using 1 core? – aran Apr 27 '19 at 20:56
  • I'm using a laptop that has Intel core i5-4202Y CPU(not a great one yes but it has 2 cores and 4 logical processors). – Svajunas Kavaliauskas Apr 27 '19 at 20:59
  • Try to maximize the test, so it gets 30 seconds instead of 3 to complete, and check wether it uses 2 logical processors for the job. If not, there's no optimization whatsoever, and the cpu turn changes may be delaying you – aran Apr 27 '19 at 20:59
  • This is just theory, seems like it is multitasking, instead of multithreading – aran Apr 27 '19 at 21:05
  • How would I check how many logical processors the running program is using. Also, if it is multitasking any ideas what could be the cause? As far as I checked by adding a print statement in the run() method there is 2 different threads. (It is commented in the posted code) – Svajunas Kavaliauskas Apr 27 '19 at 21:08
  • For now, check if still there's no difference or even if the single thread job finishes before the multi one. Maximize it to 1 minute or so. – aran Apr 27 '19 at 21:09
  • @aran After applying the answer bellow I get 40s for one thread and 30s for two threads however 4 threads does worse with 42s. Could it be because there are only 2 cores? – Svajunas Kavaliauskas Apr 27 '19 at 21:45
  • yep, seems correct, although it may be affected by many factors – aran Apr 27 '19 at 22:18
  • @aran what other factors could it be? – Svajunas Kavaliauskas Apr 28 '19 at 06:46
  • @aran I tried running on a better PC with more cores, but the problem remains the same. – Svajunas Kavaliauskas Apr 28 '19 at 15:13

1 Answers1

3

You are not waiting for your tasks to complete, only measuring the time it takes to submit them to the executor.

executor.shutdown() does not wait for all tasks get finished.You need to call executor.awaitTermination after that.

executor.shutdown();
executor.awaitTermination(5, TimeUnit.HOURS);

https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ExecutorService.html#shutdown()

Update I believe that our testing methodology is flawed. I repeated your test on my machine, (1 processor, 2 cores, 4 logical processors) and the the time measured from run to run differed wildly.

I believe the following are main reasons:

  • JVM startup & JIT compilation time. At the beginning, the code is running in interpreted mode.
  • result of calculation is ignored. I have no intuition what is removed by the JIT and what we are actually measuring.
  • printlines in code

To test this, I converted your test to JMH. In particular:

  • I converted the runnable to a callable, and I return the sum of results to prevent inlining (alternativaly, you can use BlackHole from JMH)
  • My tasks have no state, I moved all moving parts to local variables. No GC is needed to cleanup the tasks.
  • I still create executors in each round. This is not perfect, but I decided to keep it as is.

The results I received below are consistent with my expectations: one core is waiting in the main thread, the work is performed on a single core, the numbers are rougly the same.

Benchmark                  Mode  Cnt    Score    Error  Units
SpeedTest.multipleThreads  avgt   20  559.996 ± 20.181  ms/op
SpeedTest.singleThread     avgt   20  562.048 ± 16.418  ms/op

Updated code:

public class ParallelCollatz implements Callable<Long> {

    private final long inputNumInit;

    public ParallelCollatz(long inputNumInit) {
        this.inputNumInit = inputNumInit;
    }


    @Override
    public Long call() {
        long result = 1;
        long inputNum = inputNumInit;
        //main recursive computation
        while (inputNum > 1) {

            if (inputNum % 2 == 0) {
                inputNum = inputNum / 2;
            } else {
                inputNum = inputNum * 3 + 1;
            }
            ++result;
        }
        return result;
    }

}

and the benchmark itself:

@State(Scope.Benchmark)
public class SpeedTest {
private static final int NUM_TASKS = 1000000;

    private static List<ParallelCollatz> tasks = buildTasks();

    @Benchmark
    @Fork(value = 1, warmups = 1)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @SuppressWarnings("unused")
    public long singleThread() throws Exception {
        ThreadPoolExecutor executorOneThread = (ThreadPoolExecutor) Executors.newFixedThreadPool(1);
        return measureTasks(executorOneThread, tasks);
    }

    @Benchmark
    @Fork(value = 1, warmups = 1)
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.MILLISECONDS)
    @SuppressWarnings("unused")
    public long multipleThreads() throws Exception {
        ThreadPoolExecutor executorMultipleThread = (ThreadPoolExecutor) Executors.newFixedThreadPool(4);
        return measureTasks(executorMultipleThread, tasks);
    }

    private static long measureTasks(ThreadPoolExecutor executor, List<ParallelCollatz> tasks) throws InterruptedException, ExecutionException {
        long sum = runTasksInExecutor(executor, tasks);
       return sum;
    }

    private static long runTasksInExecutor(ThreadPoolExecutor executor, List<ParallelCollatz> tasks) throws InterruptedException, ExecutionException {
        List<Future<Long>> futures = new ArrayList<>(NUM_TASKS);
        for (int i = 0; i < NUM_TASKS; i++) {
            Future<Long> f = executor.submit(tasks.get(i));
            futures.add(f);
        }
        executor.shutdown();

        boolean tempFirst = false;
        try {
            tempFirst = executor.awaitTermination(5, TimeUnit.HOURS);
        } catch (InterruptedException e1) {
            // TODO Auto-generated catch block
            e1.printStackTrace();
        }
        long sum = 0l;
        for (Future<Long> f : futures) {
            sum += f.get();
        }
        //System.out.println(sum);
        return sum;
    }

    private static List<ParallelCollatz> buildTasks() {
        List<ParallelCollatz> tasks = new ArrayList<>();
        for (int i = 1; i <= NUM_TASKS; i++) {
            ParallelCollatz task = new ParallelCollatz((long) (i + NUM_TASKS));

            tasks.add(task);

        }
        return tasks;
    }

}
Lesiak
  • 22,088
  • 2
  • 41
  • 65
  • I did as you mentioned. Now I get 12s and 19s and still slower. – Svajunas Kavaliauskas Apr 27 '19 at 21:31
  • You have 2 cores only. My guess is that one is occupied by the main thread, and 2 worker threads are thrashing on a single core. I am not aware of any method of finding current core directly from Java, but to get some intuition you can connect to your program with jconsole/visualvm and observe your threads. Are they working concurrently without thrashing? – Lesiak Apr 27 '19 at 21:46
  • I run the executor with one thread then print out the time then launch executer2 with 2 threads and print out the time. Nothing crashes and as far as I can see 2 threads do work faster at least when the program is arround 40s. But 4 threads are slower than 1 thread. – Svajunas Kavaliauskas Apr 27 '19 at 21:49
  • Sorry, but we are not on the same page. Thrashing (not crashing) means that multiple threads are competing for the same resource (in my hypothesis, the only remaining core) and only one is executing at a time, which means no speedup, only additional work for context switching. How did jconsole/visualvm experiment work? – Lesiak Apr 27 '19 at 21:54
  • Oh sorry I missread it. I never used those programs so it might take some time to try it out. I will comment when I'm done. – Svajunas Kavaliauskas Apr 27 '19 at 21:57
  • From what I see with 2 threads everything is running fine. However running 3 or 4 there is a lot of waiting time for the threads. Most of the time only one thread is running at a time. (I usee VisualVM). What I don't like is the fact that the 2nd set of threads no matter how many there are do not finish in the debug mode for some reason the program just hangs. – Svajunas Kavaliauskas Apr 27 '19 at 22:51
  • I tried with a better PC with more cores, but the problem remained the same. I have no idea what I'm doing wrong. – Svajunas Kavaliauskas Apr 28 '19 at 15:23
  • 1
    Thank you for the in depth, updated answer, after trying various things, it seems that calculating low number of Collatz sequences in a thread is not the way to go. I got the expected speed up when giving a higher number of sequences to calculate for the thread (1000 or 10000). Of course, I ran on a machine with more cores as well. Thank you for the help again. – Svajunas Kavaliauskas May 06 '19 at 17:43