1

I'm working on a commercial game with Unity/C# and I'm reviewing and familiarizing myself with the code. And I started wondering about this, because there are a lot of cases where you're holding a List full of things like GameObjects, Components or ScriptableObjects ... for instance you might have a field like this in an RTS game class:

protected List<Unit> allUnits = new List<Unit>();

Now, suppose an event is triggered at runtime where you need to cycle through allUnits and operate on certain ones or select only certain ones you want or even determine the number of them that satisfy a certain condition. Bear with me on this contrived example, but the 3 standard approaches are for-loop, for-each or a Linq extension method / query statement. Let's say I want to find all units that are not critically wounded or dead and are low on ammo. I can use the following ways to deal with this:

        for ( int i = 0; i < allUnits.Count; i++ ) {

            var next = allUnits[i];

            if ( next.IsDead ) continue;
            if ( next.IsCriticallyWounded ) continue;
            if ( next.AmmoCount >= Magazine.MaxCapacity * 3 ) continue;
            else
                unitsLowOnAmmo.Add( next );
        }

You could use the same logic in a foreach( var next in allUnits ) loop, so I won't repeat the same code again. But another approach would be like this, using Linq extensions:

        unitsLowOnAmmo = allUnits.Where( u => 
                                        !u.IsDead && 
                                        !u.IsCriticallyWounded && 
                                        !u.AmmoCount >= Magazine.MaxCapacity * 3 ).ToList();

You could also use this syntax to find everything under AI control, for example:

        var aiUnits = (from Unit u in allUnits
                      where u.AIControlled
                      select u).ToList();

In another situation, you might need to find the total number of units satisfying a set of conditions, like I want to find all of the AI-controlled units that are critically wounded so they can maybe send medics or try to bring them to safety in a field hospital. So I could do it by using an accumulator, like for ( int i = 0, count = 0; i < allUnits.Count; i++ ) and then check for the wrong conditions and continue; the loop otherwise let it fall through to a count++; statement if it passes the filter. Or, obviously, I could use int count = List<T>.Count( u => cond1 && cond2 ); ...

Now, obviously, the Linq stuff is much cleaner and more expressive, and the code is a bit easier to read and maintain, but we've all heard the wise masters of C# say "Linq has bad performance! Never use Linq!" over the years. And I'm wondering just how true or not true that prejudice against Linq is, and how the performance of these different approaches really differ and perform "in the field". I'm working on a very large project with about 24GB of assets, code and data and it's quite complicated, and there are lots of List instances in classes storing all kinds of stuff that regularly need to be iterated through and filtered or counted. In some cases it's frame to frame, and in other cases it's after a specific amount of time or upon an event or method call. Performance is already a major concern/issue in this project, and we want the game to be able to run on most people's computers.

Can anyone shed some light on performance comparisons and what the best approach would be to cycling through collections to filter, select and count them in the most performant way? Perhaps there's even a superior approach I didn't mention here that could be far better? I've read through some articles on here (and other sites) that didn't quite answer the question in enough detail for me to feel satisfied with it. I'm also just getting caught up on all the latest stuff in C# and .NET after being away a little while, and I'm not sure if there has been any changes to the framework (or language) that may completely change the things people used to say about Linq. I've heard that Microsoft is boasting of performance improvements in a lot of areas of .NET and wonder if any of those gains pertained to this situation. In any case, I just want to figure out the ideal approach to rapidly filtering, operating on and counting my big (and small collections) as quickly and with as little memory overhead as possible.

  • 2
    Perhaps consider other data structures, such as `Dictionary` or `HashSet`, or perhaps a [mutable `Lookup`](https://stackoverflow.com/questions/15132252/why-is-lookup-immutable-in-c) – Charlieface Oct 17 '21 at 22:28
  • I've generated some test cases creating a list of 100 objects with randomized fields, and tried for, foreach and Where with the same filters and using Stopwatch to count the elapsed system ticks. The regular for loop seems to win by a wide margin in this simple test case, looking for Units with "low ammo" values. For comparison, the number of ticks were: Where: 10607 for-loop: 232 foreach: 882 – Aaron Carter Oct 17 '21 at 22:33
  • 1
    On the other hand, I was trying to say you should rethink the whole design. Perhaps you shouldn't be looping lists and filtering them in the first place, for example instead keep an actual `unitsLowOnAmmo` list in memory – Charlieface Oct 17 '21 at 22:38
  • Changing the order of the tests seems to change the results sometimes and make foreach faster. I'm suspicious of the results though. Going to try another idea. – Aaron Carter Oct 17 '21 at 22:38
  • @Charlieface Of course, but I just came into this project that's about a year into production and I'm not doing a heavy refactoring yet. And my example is totally contrived, there's a lot more complexity to the real code. I was just curious about the topic and how we could optimize things in performance critical areas. – Aaron Carter Oct 17 '21 at 22:41
  • Does this answer your question? [For vs. Linq - Performance vs. Future](https://stackoverflow.com/questions/14893924/for-vs-linq-performance-vs-future) – Charlieface Oct 17 '21 at 22:43
  • Rewrote the unit tests where I wrapped each test in a static method that returns the elapsed ticks. Before the actual test I call every test method one time to make sure it's JIT compiled before the real tests. Then I run the tests. And now I'm finding that for is blazing fast, usually 100 to 200 ticks or less, followed by Where being only a bit slower and foreach is the slowest. – Aaron Carter Oct 17 '21 at 22:49
  • I remembered the test results aren't really very accurate run one time, so I took the average of 100+ tests and the performance on each is very close, with for loops being the fastest. – Aaron Carter Oct 18 '21 at 01:12
  • As @Charlieface already tried to tell you more than once: **Use a better data structure**! It sounds to me e.g. that `isDead` is not something that changes every frame => only update the collections **event driven** by moving the elements between an alive and a dead collection once this value is actually changed. The same for `AIControlled` that sounds like something actually never changing at all so why keep them in the same list anyway? And in general the advantage of `Linq` kicks in when you do not specifically cast to array or list everytime but rather directly iterate it only once ;) – derHugo Oct 18 '21 at 08:24
  • @derHugo as I said, the example case is totally contrived. It's a huge project, and in real situations you made need to cycle through units and find ones fitting a set of conditions specified by the player at runtime. There are all kinds of situations where you simply can't store a collection of something in advance in some kind of faster data structure, but of course there are cases where you can and I will. The example was just to demonstrate the nature of the question. But my tests have shown the differences to be quite small. Linq was slow the first time it JITs then it gets faster. – Aaron Carter Oct 18 '21 at 13:55
  • other improvements would be to do things like `Magazine.MaxCapacity * 3` only once before the iteration or even store it somewhere as `const` .. but in general we are talking about micro improvements here ;) – derHugo Oct 18 '21 at 14:08
  • Yeah, that's also a good point, removing an unnecessary op is always a plus. I guess the moral of the story is that for is faster for sure (lol) but Linq honestly isn't that bad except on its first call to JIT and run. I'll be selective about when and where to employ it. I just like how it offers clean one-liner solutions to some complex operations and is so easy to change later. Makes large bodies of code simple to modify and improve upon for sure. – Aaron Carter Oct 19 '21 at 17:48

0 Answers0