3

In my algorithm, what I am trying to do is following.

while (R.Count > 0)
{
    //R is also  List<string>()   
    var N = new List<string>();
    var first = R[0];
    N.Add(first);
    R.Remove(first);
    //below commented code runs just fine but it takes a lot of time that is why i need to do multithreading to make it faster
    //for (int i = R.Count - 1; i >= 0; i--)
    //{
    //    if (hamming(first, R[i]))
    //    {   //hamming is a function just compare two strings and returns true or false.
    //        N.Add(R[i]);
    //        R.RemoveAt(i);
    //    }
    //}

    //Below is code of my attempt of multithreading the loop. I have tried it with foreach loop as well and it gives same error 'index out of range or argument exception'

    //ATTEMPT 1 :-
    Parallel.For(0,R.Count, i =>
    {

        if (hamming(first, R[i]))
        {
            N.Add(R[i]);
            R.RemoveAt(i);
        }
    });
    //ATTEMPT 2 :-
    Parallel.For(0,R.Count, i =>
    {

        if (hamming(first, R[i]))
        {
            N.Add(R[i]);
            R[i]="";
        }
    });
    var K = R.Where(a => a == "").ToList();
    var nc = cou - N.Count;
    //the value of 'K.Count' and 'nc' should be same here but I have checked in debugger its not the same.

    N_Total.Add(N);//this is just a List<List<string>>

}

The Code is pretty self explanatory but I will still try to elaborate ir further here.

Basically I need to run this algo and compare values as shown in the code and if hamming returns true I have to add that value to 'N' and remove it from 'R', I have to remove it because when next time the outer while loop runs List 'R' should be smaller and only those values should be present in R which didn't satisfy the hamming condition in previous run of the loop.

I can elaborate further if someone needs to understand more.

What I want is that to achieve this goal in some multithread way and without exceptions of index out of range or Argument exceptions.

Thanks a lot in advance.

Prisoner
  • 1,839
  • 2
  • 22
  • 38
Muhammad Touseef
  • 4,357
  • 4
  • 31
  • 75

2 Answers2

2

First of all List<string> is not ThreadSafe which means it should't be used in parallel operations at all.

Try that: ConcurrentBag<string> instead.

ConcurentBag exists in System.Collections.Concurrent namespace, which contains few more thread safe collections.


Another thing is:

You want to make sure if that index exists before doing anything with it.


ConcurentBag may have some restrictions, maybe it's worth of checking other collections from that namespace: System.Collections.Concurrent as those are ThreadSafe.

https://msdn.microsoft.com/en-us/library/system.collections.concurrent.aspx

madoxdev
  • 3,770
  • 1
  • 24
  • 39
  • I tried Concurrent queue but it only removes items from begining ( if matched ) so tht doesnt work for me, because I am in parallel so I have to remove any random item which that loop is accessing at that time. – Muhammad Touseef Dec 28 '16 at 10:31
  • how are you removing items from it? In your case you want to use `TryTake` method I assume that returns (to add it to another collection) and remove object from source collection. – madoxdev Dec 28 '16 at 10:37
1

Use Parallel.Foreach on R. Parallel.Foreach will split your list into smaller chunks and starts processing them. so indexes from different threads will not collide with each other.

for N you will use ConcurrentBag instead of List because its thread safe. that means when two threads happen to add item to your bag strange things will not happen.

if you remove items inside Parallel you should notify all threads from new changes which will be hard (and quite ugly) to implement.

List<string> R = new List<string>();


while (R.Count > 0)
{
    var removing = new ConcurrentBag<long>();

    var N = new ConcurrentBag<string>();
    var first = R[0];
    N.Add(first);
    R.Remove(first);

    Parallel.ForEach(R, (item, state, index) =>
    {
        if(hamming(first, item))
        {
            N.Add(item);

            R[(int)index] = null; // mark as null and ignore. 
                                  // this is not thread safe for versioning of list but doesn't matter.
                                  // for R ConcurrentBag can be used too but it doesn't change results after all.
        }
    });

    // now we are safe to reorganize our collection.
    R = R.Where(str => str != null).ToList(); // parallel execution doesn't help. see comments below. 
                                              // for very large collection this will finish in few milliseconds.

    // get other stuff...
}
M.kazem Akhgary
  • 18,645
  • 8
  • 57
  • 118
  • ConcurrentBag 'N' doesnt have any extension method of 'Add', how can I then add items to it? – Muhammad Touseef Dec 28 '16 at 10:27
  • double check your code. did you put `using System.Collections.Concurrent;` at top of your code? @touseef – M.kazem Akhgary Dec 28 '16 at 10:28
  • ConcurentBag has method Add, https://msdn.microsoft.com/en-us/library/dd381779.aspx – madoxdev Dec 28 '16 at 10:38
  • yes the Add method is working now but its still giving index error on removing loop, – Muhammad Touseef Dec 28 '16 at 10:45
  • @touseef oh. I see. fixed it try it now. added `deleted` variable – M.kazem Akhgary Dec 28 '16 at 10:48
  • yes it is fixed but now removing foreach loop is taking too long :( cant i make this loop parallel as well ? – Muhammad Touseef Dec 28 '16 at 11:04
  • i understand that index wise removal in parallel will cause same problems, but can i just remove items in a parallel way, because I already know the items to remove are all stored in 'N'. so I have to remove all those items which belong to N – Muhammad Touseef Dec 28 '16 at 11:07
  • update : Parallel.ForEach(N, n => { R.Remove(n); }); I tried to do this and now i checked with debugging, each time this foreach loop is removing different number of elements from R. which makes no sense at all :/ – Muhammad Touseef Dec 28 '16 at 11:24
  • @touseef I have updated my answer and the way it removes the items. check the results now. and tell me if there is more to know (and can you tell how long it takes for your method to finish?) – M.kazem Akhgary Dec 28 '16 at 11:38
  • 1
    I would be _very_ surprised if it turned out that `R = R.AsParallel().Where(str => str != null).ToList();` was more efficient than `R = R.Where(str => str != null).ToList();` – Kris Vandermotten Dec 28 '16 at 11:52
  • I haven't checked the implementation yet. Ill check it out. and will use another method if both behave same. @KrisVandermotten – M.kazem Akhgary Dec 28 '16 at 11:54
  • 1
    @touseef Note that `hamming` method also have its own performance impacts. you should give details about how large your list is and how `hamming` is implemented – M.kazem Akhgary Dec 28 '16 at 11:55
  • sir your updated answer is working for me, it is giving the right result as i expected, thanks a lot, and yes I will give u detaisl about hamming as well shortly after I test my full data on this code :) – Muhammad Touseef Dec 28 '16 at 12:00
  • @KrisVandermotten agreed, they behave different but `AsParallel.Where` has lot more overhead than plain `Where`.. for 10 millions items parallel became faster by few margin. – M.kazem Akhgary Dec 28 '16 at 12:09
  • yes i tried it and parallel is slightly faster considering my list has millions of objects – Muhammad Touseef Dec 28 '16 at 12:33