2

I am doing some heavy computations in C# .NET and when doing these computations in parallel.for loop I must collect some data in collection, but because of limited memory I can't collect all results, so I only store the best ones.

Those computations must be as fast as possible because they are already taking too much time. So after optimizing a lot I find out that the slowest thing was my ConcurrentDictionary collection. I am wondering if I should switch to something with faster add, remove and find the highest (perhaps a sorted collection) and just use locks for my main operation or I can do something good using ConcurrentColletion and speed up it a little.

Here is my actual code, I know it's bad because of this huge lock, but without it I seem to lose consistency and a lot of my remove attempts are failing.

 public class SignalsMultiValueConcurrentDictionary : ConcurrentDictionary<double, ConcurrentBag<Signal>>
{
    public  int Limit { get; set; }
    public double WorstError { get; private set; }

    public SignalsDictionaryState TryAddSignal(double key, Signal signal, out Signal removed)
    {
        SignalsDictionaryState state;
        removed = null;

        if (this.Count >= Limit && signal.AbsoluteError > WorstError)
            return SignalsDictionaryState.NoAddedNoRemoved;

        lock (this)
        {
            if (this.Count >= Limit)
            {
                ConcurrentBag<Signal> signals;
                if (TryRemove(WorstError, out signals))
                {
                    removed = signals.FirstOrDefault();
                    state = SignalsDictionaryState.AddedAndRemoved;
                }
                else
                    state = SignalsDictionaryState.AddedFailedRemoved;
            }
            else
                state = SignalsDictionaryState.AddedNoRemoved;

            this.Add(key, signal);
            WorstError = Keys.Max();
        }
        return state;
    }

    private void Add(double key, Signal value)
    {
        ConcurrentBag<Signal> values;
        if (!TryGetValue(key, out values))
        {
            values = new ConcurrentBag<Signal>();
            this[key] = values;
        }

        values.Add(value);
    }
}

Note also because I use absolute error of signal, sometimes (should be very rare) I store more than one value on one key.

The only operation used in my computations is TryAddSignal because it does what I want -> if I have more signlas than limit then it removes signal with highest error and adds new signal.

Because of the fact that I set Limit property at the start of the computations I don't need a resizable collection.

The main problem here is even without that huge lock, Keys.Max is a little too slow. So maybe I need other collection?

BartoszKP
  • 34,786
  • 15
  • 102
  • 130
Pawel Troka
  • 853
  • 2
  • 12
  • 33
  • 2
    Well, make it fast and safe by giving every thread its own collection, merge the results after they are done. Btw, this is a site where lots of professional devs visit and can help you, consider a more appropriate user name. – Hans Passant Jul 25 '15 at 17:39

3 Answers3

3

The large lock statement is at least dubious. An easier improvement, if you say that Keys.Max() is slow, is to calculate the maximum value incrementally. You'll need to refresh it only after removing a key:

//...
if (TryRemove(WorstError, out signals))
{
    WorstError = Keys.Max();

//...

WorstError = Math.Max(WorstError, key);
BartoszKP
  • 34,786
  • 15
  • 102
  • 130
3

Keys.Max() is the killer. That's O(N). No need for a dictionary if you do this.

You can't incrementally compute the max value because you are adding and removing. So you better use a data structure that is made for this. Trees usually are. The BCL has SortedList and SortedSet and SortedDictionary I believe. One of them was based on a fast tree. It has min and max operations.

Or, use a .NET collection library with a priority queue.

Bug: Add is racy. You might overwrite a non-empty collection.

BartoszKP
  • 34,786
  • 15
  • 102
  • 130
usr
  • 168,620
  • 35
  • 240
  • 369
  • Thanks, that's what I was affraid of. So I guess I cant use collections made for concurrent programming? What do you mean by a .NET collection library with a priority queue? What collection will be the fastest for my problem from these (note that sometimes signals have equal error and I need to store them)? – Pawel Troka Jul 25 '15 at 14:56
  • You are not making use of the concurrency features of those collections anyway. You have wrapped them in a lock. This is slower than using a normal dictionary. Use a priority queue. It does not matter from what library it comes. Find any such queue. You can store multiple items with equal keys by making the key `Tuple(realKey, Guid.NewGuid())` as a hack. – usr Jul 25 '15 at 14:58
  • There are priority queues that are lock-free. Maybe you can find one on the web somewhere. – usr Jul 25 '15 at 14:59
  • @Fucker24 Sometimes you can calculate max incrementally. If removing doesn't occur too often, it will improve the performance. – BartoszKP Jul 25 '15 at 17:12
1

What I did in the end was to implement Heap based on binary-tree as was suggested by @usr. My final collection was not concurrent but synchronized (I used locks). I checked performance thought and it does the job fast enough. Here is pseudocode:

public class SynchronizedCollectionWithMaxOnTop
{
    double Max => _items[0].AbsoluteError;

    public ItemChangeState TryAdd(Item item, out Item removed)
    {
        ItemChangeState state;
        removed = null;

        if (_items.Count >= Limit && signal.AbsoluteError > Max)
            return ItemChangeState.NoAddedNoRemoved;

        lock (this)
        {
            if (_items.Count >= Limit)
            {
                removed = Remove();
                state = ItemChangeState.AddedAndRemoved;
            }
            else
                state = ItemChangeState.AddedNoRemoved;

            Insert(item);
        }
        return state;
    }

    private void Insert(Item item)
    {
        _items.Add(item);
        HeapifyUp(_items.Count - 1);
    }

    private void Remove()
    {
        var result = new Item(_items[0]);

        var lastIndex = _items.Count - 1;

        _items[0] = _items[lastIndex];
        _items.RemoveAt(lastIndex);

        HeapifyDown(0);

        return result;
    }
}
Pawel Troka
  • 853
  • 2
  • 12
  • 33