6

In my code I have an object that contains a series of pixel coordinates. Performance of this object is critical because it's being used in a 60fps game where the output cannot always be cached.

After experimentation and benchmarking, a 3D array proved to be the fastest way of implementing it when using untyped arrays:

var PixelCollection = function() {
    this.pixels = [];
};

PixelCollection.prototype = {
    add: function(x, y) {
        var pixels = this.pixels;
        if (pixels[y]) {
            pixels[y].push(x);
        } else {
            pixels[y] = [x];
        }
    },
    each: function(callback) {
        var pixels = this.pixels;
        for (var y = 0, m = pixels.length; y < m; y++) {
            var row = pixels[y];
            if (row) {
                for (var i = 0, mm = row.length; i < mm; i++) {
                    callback(row[i], y);
                }
            }
        }
    }
};

In some situations, the object is not fast enough, so I tried a Uint8Array implementation, using a 2D array:

var WIDTH = 255;
var HEIGHT = 160;

PixelCollection = function() {
    this.pixels = new Uint8Array(WIDTH * HEIGHT);
};

PixelCollection.prototype = {
    add: function(x, y) {
        this.pixels[WIDTH * y + x] = 1;
    },
    each: function(callback) {
        var pixels = this.pixels;
        for (var i = 0, m = pixels.length; i < m; i++) {
            if (pixels[i]) {
                callback(i % WIDTH, Math.floor(i / WIDTH));
            }
        }
    }
}

This is slower. I thought it would be faster because writing to - and reading from Uint8arrays is faster, but since I'm creating a huge object for every PixelCollection object, retrieving pixels is way slower because it takes longer to iterate over all pixels. (Note: I also tried the implementation above with an untyped array: it is a lot slower)

A PixelCollection typically does not have all pixels set. However, the bounding box may span the entire canvas so I do need to create the array using a buffer this big.

There may be a way around that though. I can create one big shared buffer and use a byte offset for every PixelCollection. So when PixelCollection P1 would take up 100 bytes, then PixelCollection P2 would start at byte offset 100. This uses memory more efficiently, but I would need to keep track of the range of bytes every PixelCollection uses (is this what C calls "pointers"?).

Annoying part: when P1's bounding box expands, I need to shift P2 to make space for P1. And I would need to set a safe buffer size for the shared buffer, so I need to make a safe guesstimation of how much memory it needs.

Implementing this is possible, but it would require lots of time, trial and error.

So before start on this: does it seem a good way to do it? Are there better or simpler ways to do this? Do you know any example implementations I could learn from?

Edit: I added a jsperf in case you want to try your hand at an optimization.
Please add to this jsperf if you have a brilliant idea for an optimization.

Blaise
  • 13,139
  • 9
  • 69
  • 97
  • 1
    Your second implementation of `each` is doing a lot more math, have you factored that out as a cause of the speed difference? You can track the values to pass into `callback` rather than using `%` and `/` each time... Another difference (if I'm reading this right) is that you do a lot less traversing in the first implementation when the second dimensions are small (e.g., you haven't done that many calls to `add` for a given `y`). – T.J. Crowder Jun 01 '14 at 10:41
  • I cannot remove that math because I *always* need both x and y in the callback function. – Blaise Jun 01 '14 at 10:45
  • @ Blaise: That doesn't follow. I didn't say not to pass them in. I said: *"You can track the values..."* E.g., have `x` and `y` variables that you increment and roll over as appropriate. – T.J. Crowder Jun 01 '14 at 10:46
  • Sorry, you're right, you said *tracking*. But tracking also requires math and it is slow as well in my test. (Tracking code inside the for loop: `if (i && i % xss.WIDTH === 0) { x = 0; y++; }`). And in reply to *Another difference ...*: Yes, a 3D array is indeed faster because it has to iterate less. But I cannot make a 3D implementation much faster, but I may be able to improve by making a 2D array very fast (using a typed array). – Blaise Jun 01 '14 at 10:57
  • 3
    You could do it a much simpler way mathematically like this; http://pastebin.com/8ey5a65m – Pudge601 Jun 01 '14 at 10:59
  • A nested loop is faster than doing `%`, `Math.floor` and `/`, you can still use a nested loop with your 2D structure – Paul S. Jun 01 '14 at 11:06
  • @Pudge601 Thanks for your demo code. It's still slow unfortunately. I think the only way to make it faster is by making it iterate less, which can only be done by *not* allocating the theoretical maximum size. – Blaise Jun 01 '14 at 11:07
  • @PaulS: Could you show me pseudo/demo code? – Blaise Jun 01 '14 at 11:07
  • @Pudge601: We can do even better than that. There's no reason at all for multiplication or division for this nested loop. Just increment `i`, `x`, and `y` where appropriate, and wrap `x` when `x === WIDTH` (using a comparison, not `%`). (Entirely possible this won't solve the problem, but it eliminates the expensive math.) – T.J. Crowder Jun 01 '14 at 11:43
  • The complexity of the math is not the problem. Even with all math removed (so non-functional), it takes way more time than the 3D version. I would really like advice on a solution that involves a shared buffer. – Blaise Jun 01 '14 at 11:58
  • @Blaise compare using e.g. `var WIDTH = 255, HEIGHT = 160, i, j, u = new Uint8Array(WIDTH * HEIGHT); for (i = 0; i < u.length - WIDTH; i += WIDTH) for (j = 0; j < WIDTH; ++j) u[i + j];` – Paul S. Jun 01 '14 at 12:44

1 Answers1

1

By slower I guess you mean running PixelCollection.each? The second example might be slower if there aren't that many points that are set to 1, as it still checks all possible locations whilst the first only runs through added points. If you really want every possible nanosecond at any cost (in this case memory), then you can pre-allocate the maximum size in two Uint8Arrays and separately keep track of the size.

var WIDTH = 255;
var HEIGHT = 160;

var PixelCollection = function() {
    this.pixels.length = 0;
    this.pixels.x = new Uint8Array(WIDTH * HEIGHT);
    this.pixels.y = new Uint8Array(WIDTH * HEIGHT);

};

PixelCollection.prototype = {
    add: function(x, y) {
        this.pixels.x[this.pixels.length] = x;
        this.pixels.y[this.pixels.length] = y;
        this.pixels.length++;
    },
    each: function(callback) {
        var pixels = this.pixels;
        for (var i = 0; i < this.pixels.length; i++) {
            callback(this.pixels.x[i], this.pixels.y[i]);

        }
    }
};

If you know a limit to the number of additions the PixelCollection will have then you can reduce the memory usage. You could even combine the two arrays into one double length alternating x and y values.

However, if you want to be able to remove individual points this method gets tricky, also it's unlikely to be much faster than your first method with @Pudge601's loop change.

Phil
  • 1,110
  • 1
  • 9
  • 25
  • Thank you for your answer. This makes `each` a lot faster, but now I have other issues: [PixelCollection also has other public methods like `has` and `remove`](https://github.com/blaise-io/xssnake/blob/master/shared/pixel_collection.js) which now require iterating = expensive, and preferably I need to be able to sort as well so I can [optimize painting](https://github.com/blaise-io/xssnake/blob/master/client/js/shape_cache.js#L54), and I don't see how that can done cheaply without copying `.pixels`. — I will prepare a jsperf this weekend so I can share benchmarks and insights. – Blaise Jun 04 '14 at 18:17
  • @Blaise don't use things like your `each` in tight spots like... ever. It's very slow compared to simply iterating it with a for loop from the outside, which might be uglier but performs a lot better. – Benjamin Gruenbaum Jun 06 '14 at 10:18
  • @Blaise also remember that the most important thing in the world right now is __ram line caching__. You __must__ iterate it the same way it's stored in memory, so that the CPU cache can fetch close by numbers in chunks rather than not. See http://jsperf.com/ram-line-caching – Benjamin Gruenbaum Jun 06 '14 at 10:20
  • @BenjaminGruenbaum: Thanks for your comments. I'm aware of `each` being slower than necessary due to the overhead of a closure / callback fn. However, I don't want to remove it right now because I would access the internal workings of PixelCollection, which hasn't completely settled yet. I haven't heard of ram line caching, but I think my 3D array is doing exactly that. Can you confirm? – Blaise Jun 06 '14 at 14:05