Finding duplicates in array with Threads C#

Question

I try to find the duplicated element in an array(sorted) using threads. My idea is to split the array in two and assign each half to a thread. But when i run the code I get the ~exact same time like i didn't used threads.

My code:

        Parallel.Invoke(()=>Search(0,elements.Length/2,elements),
            ()=>Search(elements.Length/2,elements.Length-1,elements));

And Search method:

    public static void Search(int start , int stop , int[] elements)
    {
        for (var i = start; i < stop; i++)
        {
            if (elements[i] == elements[i + 1])
            {
                Console.WriteLine("found ,time  " + (DateTime.Now - _start));
                break;
            }
        }
    }

Array is 100000000 elements. Also i am not looking for the LINQ -like solution.

solution in this might help you http://stackoverflow.com/questions/19757992/how-do-i-check-if-my-array-has-repeated-values-inside-it — Dandy, Mar 09 '17 at 12:59
How big is the array you used to test this? The TPL decides very smart if a new thread is necessary or if the overhead to create/take a thread and switch contexts would eat up any benefit. So I guess `Parallel.Invoke` simply uses only one thread for your search. — René Vogt, Mar 09 '17 at 13:00
@Botonomous they only read, and they even read in different parts of the array. Can't see any problem here. — René Vogt, Mar 09 '17 at 13:03
Multithreading should be done only for "big" things (it is slow to setup) I'll say that anything that doesn't impact the user interface and that is shorter than 0.1 sec shouldn't be multithreaded. And between 0.1 and 1 sec I would think quite much before doing it. — xanatos, Mar 09 '17 at 13:03
yeah but how will each thread know about dupes in the other chunks? — Botonomous, Mar 09 '17 at 13:03
Parallel.Invoke uses one thread for each delegate. Data parallelism is offered by `PLINQ` or `Parallel.For/Foreach` — Panagiotis Kanavos, Mar 09 '17 at 13:04
The array is 100000000 int's I used this size so i can compare the times of the executions. — Gabriel, Mar 09 '17 at 13:04
@Botonomous The ranges should overlap with one element, true. Then it would work. Might still return duplicates for duplicates if several on array boundaries. — Sami Kuhmonen, Mar 09 '17 at 13:07
You can use all cores if you use the *appropriate* classes. You tried to use `Parallel.Invoke` as if it were a `Thread.Start` though. You only used two such "threads". Even worse, the algorithm is inherently sequential - you need the previous value in order to check for duplication — Panagiotis Kanavos, Mar 09 '17 at 13:07
@Gabriel I think you should reevaluate your requrements. At the very least you need a lock to sync access across the threads, a static array, and a parallel.for* loop. — Botonomous, Mar 09 '17 at 13:12
This approach suffers from a fundamental problem: if you're going to divide and conquer, you must make sure you don't accidentally split the input on exactly a value that's duplicated, otherwise neither half will actually contain a duplicate. So at the very least, a custom partitioning function is needed that checks this before partitioning. On the whole, this seems vastly more trouble than it's worth. How did the array get sorted/constructed? Is there any way that algorithm could be adapted to detect the duplicates as it works? — Jeroen Mostert, Mar 09 '17 at 13:17
@Botonomous there's no need to use locking. `Parallel` is meant for data parallelism, it's just that the OP is using it as if it were `Thread.Start`. The Parallel and ParallelEnumerable classes themselves offer partitioning support, paralle sorting, transforming, aggregating etc. — Panagiotis Kanavos, Mar 09 '17 at 13:26
@Gabriel what does `i am not looking for the LINQ -like solution` mean? Why *not* use the built-in partitioning and grouping support? It will definitelly run faster than the current code, especially if you take sorting into account For example, parallel grouping and selecting only keys with a count>1 will use all cores and return faster than sorting the array in the first place, then look for duplicates — Panagiotis Kanavos, Mar 09 '17 at 13:29
You still need to check all the values, because this task doesn't fit into `divide-and-conquer` approach. Introduce the hash set and iterate over array, and that's it. Both this code and your has `O(n)` complexity. — VMAtm, Mar 09 '17 at 16:37

Finding duplicates in array with Threads C#

0 Answers0