5

Apologies for the broad question! I'm learning WASM and have created a Mandelbrot algorithm in C:

int iterateEquation(float x0, float y0, int maxiterations) {
  float a = 0, b = 0, rx = 0, ry = 0;
  int iterations = 0;
  while (iterations < maxiterations && (rx * rx + ry * ry <= 4.0)) {
    rx = a * a - b * b + x0;
    ry = 2.0 * a * b + y0;
    a = rx;
    b = ry;
    iterations++;
  }
  return iterations;
}

void mandelbrot(int *buf, float width, float height) {
  for(float x = 0.0; x < width; x++) {
    for(float y = 0.0; y < height; y++) {
      // map to mandelbrot coordinates
      float cx = (x - 150.0) / 100.0;
      float cy = (y - 75.0) / 100.0;
      int iterations = iterateEquation(cx, cy, 1000);
      int loc = ((x + y * width) * 4);
      // set the red and alpha components
      *(buf + loc) = iterations > 100 ? 255 : 0;
      *(buf + (loc+3)) = 255;
    }
  }
}

I'm compiling to WASM as follows (filename input / output omitted for clarity)

clang -emit-llvm  -O3 --target=wasm32 ...
llc -march=wasm32 -filetype=asm ...
s2wasm --initial-memory 6553600 ...
wat2wasm ... 

I'm loading in JavaScript, compiling, then invoking as follows:

instance.exports.mandelbrot(0, 300, 150)

The output is being copied to a canvas, which enables me to verify that it is executed correctly. On my computer the above function takes around 120ms to execute.

However, here's a JavaScript equivalent:

const iterateEquation = (x0, y0, maxiterations) => {
  let a = 0, b = 0, rx = 0, ry = 0;
  let iterations = 0;
  while (iterations < maxiterations && (rx * rx + ry * ry <= 4)) {
    rx = a * a - b * b + x0;
    ry = 2 * a * b + y0;
    a = rx;
    b = ry;
    iterations++;
  }
  return iterations;
}

const mandelbrot = (data) => {
  for (var x = 0; x < 300; x++) {
    for (var y = 0; y < 150; y++) {
      const cx = (x - 150) / 100;
      const cy = (y - 75) / 100;
      const res = iterateEquation(cx, cy, 1000);
      const idx = (x + y * 300) * 4;
      data[idx] = res > 100 ? 255 : 0;
      data[idx+3] = 255;
    }
  }
}

Which only takes ~62ms to execute.

Now I know WebAssembly is very new, and is not terribly optimised. But I can't help feeling that it should be faster than this!

Can anyone spot something obvious I might have missed?

Also, my C code writes directly to memory starting at '0' - I am wondering if this is safe? Where is the stack stored in the paged linear memory? Am I going to risk overwriting it?

Here's a fiddle to illustrate:

https://wasdk.github.io/WasmFiddle/?jvoh5

When run, it logs the timings of the two equivalent implementations (WASM then JavaScript)

ColinE
  • 68,894
  • 15
  • 164
  • 232
  • Can you provide something like a jsfiddle link to try out? What browser are you testing in? Your stack question is answered [here](https://stackoverflow.com/a/43644387/3983557), using 0 is safe in WebAssembly, but C++ may be unhappy when compiling to WebAssembly. – JF Bastien Sep 20 '17 at 21:17
  • I'm just tying to get this working in WasmFiddle, I'll update the question as soon as I manage. The browser is Chrome 61. Thanks for the link to the stack answer. – ColinE Sep 20 '17 at 21:34
  • @JFBastien - I've added a fiddle :-) – ColinE Sep 20 '17 at 21:44
  • I went through the C version and everywhere a float was being initialized I made sure it had ".0f" and the performace increased significantly. With this change the WebAssembly version is faster than the JS version on my laptop. However, on my desktop the JS version is still faster than the WebAssembly version. The modified fiddle: https://wasdk.github.io/WasmFiddle/?xbo35 – Ghillie Sep 21 '17 at 15:58

3 Answers3

4

General

Usually you can hope to get ~10% boost on heavy math, compared to optimized JS. That consists of:

  • wasm profit
  • in/out memory copy expences.

Note, Uint8Array copy is notably slow in chrome (ok in FF). When you work with rgba data, it's better to recast underlying buffers to Uint32Array ant use .set() on it.

Attempt to read/write pixels by word (rgba) in wasm works with the same speed as read/write bytes (r, g, b, a). I did not found difference.

When use node.js for development (as i do), it worth to stay on 8.2.1 for JS benchmarks. Next version upgraded v8 to v6.0 and introduced serious speed regressions for such math. For 8.2.1 - don't use modern ES6 features like const, => and so on. Use ES5 instead. May be next version with v8 v6.2 will fix those issues.

Samples comments

  1. Use wasm-opt -O3, that may help sometime after clang -O3.
  2. Use s2wasm --import-memory instead of hardcoding fixed memory size
  3. In code at wasdk site, do NOT use global vars. When those present, compiler will allocate unknown block at memory start for globals, and you can override those by mistake.
  4. Probably, correct code should add memory copy from proper location, and that should be included into benchmark. Your samples are not complete, and IMHO code from wasdk should not work right.
  5. Use benchmark.js, that's more precise.

In short: prior to continue, it worth to cleanup things.

You may find useful to dig https://github.com/nodeca/multimath sources, or use it in your experiments. I created it specially for small CPU intensive things, to simplify issues with proper modules init, memory management, js fallbacks and so on. It contains 'unsharp mask' implementation as example and benchmarks. It should not be difficult to adopt your code there.

Vitaly
  • 3,340
  • 26
  • 21
  • 1
    To be honest, it's difficult to explain all details in text. It will be much more useful to inspect multimath src. Those are small, battle-tested and well commented. But SO rules prohibit to reply with single link :) – Vitaly Sep 30 '17 at 20:29
2

I had a case where webassembly was slow. It turned to be SAFE_HEAP option enabled on compilation. After option removed, the speed was about twice as native, so compilation options are also something to look for.

Nikolay Shmyrev
  • 24,897
  • 5
  • 43
  • 87
0

Google Chromes's "Inspect" window seems to slow down WebAssembly by ~100%. So for benchmarking you should show the results with an "alert" instead of using "console.log". Alternatively, do your benchmarking in MacOS Safari which doesn't seem to slow down WebAssembly so much. (I haven't tried MS Edge.)

Linking an external debugger like WebStorm to Chrome also slows it down.

Adam Gawne-Cain
  • 1,347
  • 14
  • 14