4

I have a ConcurrentBag in .Net 4.5 which I am storing about 4,000 rows from a database. I'm storing the DTOs.

My entire application relies on this. I have functions that return the entire list, and also have functions that return a single item. So many places in my code I am doing LINQ queries on the collection, etc.

I pushed it all to production, on site that gets considerable traffic, and immediately 100% cpu. I used the iis diagnostic tool, and sure enough, there were 50+ threads in a deadlock, waiting on the ConcurrentBag.

The documentation says this collection is thread-safe, but either that's not true, or the performance of this collection is not good thusly making it not thread-safe indirectly.

This collection unfortunately isn't read only. If one of the functions that looks up by ID returns null, it will hit a web service, and add it.

I also converted it to a ConcurrentDictionary, and had the same problem. Locks for Days on the .Values property.

What is the fastest and most thread-safe solution in most extreme scenarios?

private ConcurrentBag<Students> _students;
public static ConcurrentBag<DestinyHash> GetStudents()
{
   if (_students == null) { _students = new ConcurrentBag<Students>(); }

   return _students;
}

public static Student GetStudentByID(int id) 
{
   if (GetStudents().Any(x => x.id == id)) { return ... }

   _students.Add(getStudentFromDb(id));
   return...
}

Example usage - Littered throughout the app.

Helper.GetStudents().FirstOrDefault(x => x.name == "foo" && x.status == "bar");
Helper.GetStudentByID(50);
bladefist
  • 1,084
  • 3
  • 15
  • 25
  • A ConcurrentBag is definitely not the right data structure if you need to be able to retrieve a specific item. How are you using it exactly? We need more details to be able to help you. – Thomas Levesque Aug 15 '15 at 22:34
  • @ThomasLevesque I was basically treating it like a List. Doing LINQ, like Where, Count, FirstOrDefault right off the ConcurrentBag. That's what I need, a list in memory that can serve the rest of the app. – bladefist Aug 15 '15 at 22:36
  • My question is why are you using 50+ threads? I can't imagine you're getting any speed improvement with that many - quite the opposite actually. – Enigmativity Aug 16 '15 at 02:29
  • @Enigmativity I haven't deviated from the defaults. It's just what Debug Diagnostic said. – bladefist Aug 16 '15 at 06:05
  • @bladefist - I don't see how your response answered my question. – Enigmativity Aug 16 '15 at 07:18
  • @Enigmativity I'm saying it's the IIS default, I haven't changed anything. So I don't know, but I figure microsoft iis engineers know more than I do on the subject. – bladefist Aug 16 '15 at 14:51
  • @bladefist - I think you probably should have added that you're using [iis] to the question. So, based on that, how are you creating 50+ threads that share a single `ConcurrentBag<>`? – Enigmativity Aug 17 '15 at 02:36

2 Answers2

3

msdn states: All public and protected members of ConcurrentBag are thread-safe and may be used concurrently from multiple threads. However, members accessed through one of the interfaces the ConcurrentBag implements, including extension methods, are not guaranteed to be thread safe and may need to be synchronized by the caller.

Wouter
  • 2,540
  • 19
  • 31
  • That makes sense. That is absolutely the problem. What should I use? SQL has a command "with nolock" where your reads don't lock other threads out from reads. That's what I need here in C#. I'm fine w/ other threads reading uncommitted. – bladefist Aug 15 '15 at 23:13
  • You should check a ORM framework that supports lazy loading. The sample can add duplicate keys. There wouldn't be deadlock the collection is potentially unlimited because of the duplicates. – Wouter Aug 15 '15 at 23:28
  • btw if you run that many threads you might want to profile your application. It will probably be spending a lot of time switching between them killing your process. (Test it with incremental load until it breaks... the Testers call it stress testing.) – Wouter Aug 15 '15 at 23:40
3

The simple answer is that you're using the wrong container.

ConcurrentBag isn't general-purpose. It is intended to be used more like a pool of reusable objects that you might (usually as a last step) reduce to a single non-concurrent value. One such problem it could be used for is to sum up a list concurrently.

If your primary usage of ConcurrentBag strays from add/remove, and you're enumerating the collection frequently, then you're using it wrong.

If you post more code, you'll get more targeted help. Concurrency is one of those areas where understanding the problem is very important to provide a performant solution.

Edit:

ConcurrentDictionary will work for what you're doing. The trick is that you don't want to use ConcurrentDictionary.Values -- this will lock the dictionary and copy its contents. If you just use its IEnumerable<T> interface, you'll be fine. For instance:

private ConcurrentDictionary<int,Student> _students;

public static IEnumerable<Student> GetStudents()
{
   return _students.Select(x => x.Value);
}

public static Student GetStudentByID(int id) 
{
   Student s;
   if(_students.TryGetValue(id, out s)) return s;

   s = getStudentFromDb(id);
   _students[id] = s;

   return s;
}
Cory Nelson
  • 29,236
  • 5
  • 72
  • 110
  • What should I use then? I have a static ConcurrentBag, and a function, GetList() to get it. If it's null, I populate it from the DB, and then return the ConcurrentBag. Then all over the code, it's doing linq queries over the ConcurrentBag. It seems everyone is in agreement that I am using the wrong data structure. I don't know what the right one is. I need a List, in memory, that millions of threads can enumerate over at the same time. Occasionally, an item will be added. I don't care if other threads get that item right away or not. – bladefist Aug 15 '15 at 23:11
  • Can you explain why using Select off a ConcurrentDictionary is ok but not a ConcurrentBag? I get that using Values is bad, thanks. But I feel like the Select should be fine. – bladefist Aug 16 '15 at 06:08
  • `ConcurrentBag` is not meant to be enumerated frequently. It has the same performance issue as `ConcurrentDictionary.Values` -- it locks and copies the whole collection. – Cory Nelson Aug 16 '15 at 15:40