2

I'm trying to draw a few thousand particles using instancing. It's working and it's fast, but I have one bottleneck that slows the whole program down.

My Particle class is similar to this:

public class Particle
{
    public Vector2 Position;

    //More data not used for drawing
    //....
}

Now in my DrawLoop() I got something like this:

Vector2[] instanceData = new Vector2[numParticles];

public void Draw()
{
    for(int i = 0; i < numParticles; ++i)
        instanceData[i] = Particles[i].Position; //THAT'S the slow part

    instanceBuffer.SetData(instanceData);

    //Now draw VertexBuffer using instancing
    //...
}

I have tried using Parallel.For, but it doesn't speed things up enough, since I'm having like 8000 particles. Also I looked in the particlesystem example from MSDN. But their Particle struct just contains the data for drawing the particles, and the positions are calculated in the shader. However, I need additional data for several algorithms.

I can't think of a class design, so I don't need to assign the particle positions to the array every frame.

B.K.
  • 9,982
  • 10
  • 73
  • 105
Teflo
  • 35
  • 4
  • You could speed it up by making the loop unsafe code and using pointers. It could be array bounds checking taking a lot of time. Of course, you might also just try replacing the loop with `instanceData = Particles.Select(p => p.Position).ToArray();` and see if that's faster. – itsme86 Jul 16 '14 at 01:33
  • Wow, using Select() actually helped alot! Now it's up to 80x faster. Thanks! – Teflo Jul 16 '14 at 11:55
  • 80x faster? Really? I find that difficult to believe--Select() isn't doing anything magical, it's just running in a loop, same as you're doing. Furthermore, using LINQ inside of your main loop is a terrible idea; it allocates memory all over the managed heap. I suspect there's something more subtle going on here. – Cole Campbell Jul 16 '14 at 18:05
  • For 8000 particles I get 1.2714698s - for loop 0.2745342s - Parallel.For() 0.0023655s - LINQ. That's 537x faster than for and 116x faster than Parallel.For, if I got it right. It was 80x faster with another amount of particles, but I don't remember how many I used – Teflo Jul 16 '14 at 18:29
  • This loop is taking _over one and a quarter seconds_ to execute for 8000 particles? What is `Particles` in this code fragment? – Cole Campbell Jul 16 '14 at 18:30
  • I made a Pool class which uses a LinkedList, and I totally forgot. That explains everything... Now when I use a List, Parallel.For seems to be fastest. But now, another part of my algorithm, which uses the pool and depends on fast insert and remove is of course slower and I have to write another class for that. But thanks alot! – Teflo Jul 17 '14 at 10:18
  • I've posted an answer with a suggestion for how to improve your data structures. – Cole Campbell Jul 17 '14 at 14:36

1 Answers1

0

Since this problem ultimately arose from the data structures being used, let me present you with a common alternative to the linked list for scenarios such as this one.

Linked lists are generally not a good idea for storing particles for two reasons: one, you can't randomly access them efficiently, as you discovered here; and two, linked lists have poor locality of reference. Given the performance requirements of particle systems, the latter point can be killer.

A standard list has much better locality of reference, but as you've discovered, adding and removing items can be slow, and this is something you do commonly in particle engines.

Can we improve on that?

Let's start with something even more basic than a list, a simple array. For simplicity's sake, let's hard-cap the number of particles in your engine (we'll redress this later).

private const Int32 ParticleCount = 8000;
private readonly Particle[] particles = new Particle[ParticleCount];
private Int32 activeParticles = 0;

Assuming you have room, you can always add a particle to the end of the array in constant time:

particles[activeParticles++] = newParticleData;

But removing a particle is O(n), because all of the particles after it need to be shifted down:

var indexOfRemovedParticle = 12;
particles.RemoveAt(indexOfRemovedParticle);
activeParticles--;

What else can we do in constant time? Well, we can move particles around:

particles[n] = particles[m];

Can we use this to improve our performance?

Yes! Change the remove operation to a move operation, and what was O(n) becomes O(1):

var indexOfRemovedParticle = 12;
var temp = particles[indexOfRemovedParticle];
particles[indexOfRemovedParticles] = particles[activeParticles - 1];
particles[activeParticles - 1] = temp;
activeParticles--;

We partition our array: all of the particles at the beginning are active, and all of the particles at the end are inactive. So to remove a particle, all we have to do is swap it with the last active particle, then decrement the number of active particles.

(Note that you need the index within the array of the particle to remove. If you have to go searching for this, you end up reverting to O(n) time; however, since the usual workflow for particles is "loop through the whole list, update each particle, and if it's dead, remove it from the list," you often get the index of dead particles for "free" anyway.)

Now, this all assumes a fixed number of particles, but if you need more flexibility you can solve this problem the same way the List<T> class does: whenever you run out of room, just allocate a bigger array and copy everything into it.

This data structure provides quick inserts and removals, quick random access, and good locality of reference. The latter can be improved further by making your Particle class into a structure, so that all of your particle data will be stored contiguously in memory.

Cole Campbell
  • 4,846
  • 1
  • 17
  • 20
  • Wow, it can be so simple. That didn't just speed up the drawing, but the whole algorithm, because I don't have to create new instances any longer every frame! Thanks for your patience and detailed answer! Now I can have 30.000 particles with no problems. – Teflo Jul 18 '14 at 11:10