7

I always believed GHCJS, for obvious reasons, generated very slow JavaScript programs, compared to manually written and optimized code. When experimenting with it, though, I noticed it was not as bad as I expected. I decided to run a series of small benchmarks to get a grasp on the true performance, and this one in particular surprised me. The program simply fills an array with "1"'s and add them up.

Haskell:

import Data.Array.Repa
len  = 1024*1024*64
arr  = fromFunction (Z :. len) (const 1) :: Array D DIM1 Float
main = sumAllP arr >>= print

JavaScript:

var len = 1024*1024*64
var arr = [];
var sum = 0;
for (var i=0; i<len; ++i)
    arr[i] = 1;
for (var i=0; i<len; ++i)
    sum += arr[i];
console.log(sum);

And a crude benchmark:

apple1$ ghcjs -O2 bench_fill.hs -funfolding-use-threshold10000 -funfolding-keeness-factor1000 -o bench_fill.js; time node bench_fill.js/all.js
Linking bench_fill.js (Main)
6.7108864e7

real    0m1.543s
user    0m1.512s
sys 0m0.033s

apple1$ time node benchfill.js
67108864

real    0m1.764s
user    0m1.173s
sys 0m0.583s

How can the GHCJS run faster than a slim, clean native for-loop? That shouldn't be possible considering the amount of boxings the generated code should be exposed to.

MaiaVictor
  • 51,090
  • 44
  • 144
  • 286
  • 1
    Are you sure the Haskell compiler doesn't optimize the code to just print the end result? In any case we probably need to see the generated code to find out why it's faster. – JJJ Jan 23 '15 at 09:04
  • 9
    Provide the output .js from ghcjs please. That is how to answer the question. – Don Stewart Jan 23 '15 at 09:42
  • 1
    [This](https://gist.github.com/viclib/3023b1c44daf7f33ede4) is the output js. [This](https://gist.github.com/viclib/9108f02fb1655cdc0787) too, except it prints the whole array after printing the sum in order to make sure it was filled. – MaiaVictor Jan 23 '15 at 09:53
  • 1
    unroll the JavaScript loops to speed it up. – karakfa Jan 23 '15 at 14:56
  • 2
    @karakfa: Does that actually help? I tried unrolling to a few different lengths (ie 2, 4 and 16 additions per iteration), and it just made it a bit slower. Maybe I did something wrong? [Here's](http://lpaste.net/119079) the most unrolled version I tried. – Tikhon Jelvis Jan 23 '15 at 18:41
  • @TikhonJelvis loop unrolling usually helps (not just for JS). See this http://jsperf.com/loop-unrolling for performance tests of various approaches. Sorry, I couldn't open your link from work. – karakfa Jan 26 '15 at 16:17

1 Answers1

4

Array D DIM1 Float is a delayed array. It is just represented as the function const 1 plus the bounds of the array. There is no array of 64 million Floats stored anywhere.

The JavaScript program actually creates an array of 64 million doubles, which uses 512 MB of memory. The costs to read and write such a large array are non-negligible (as is the cost to allocate it; note the substantial system time).

Reid Barton
  • 14,951
  • 3
  • 39
  • 49
  • 1
    Ah, I see - my mistake was assuming `sumAllP` computed the array before. Calling `computeP` makes it slower - albeit still by a small (4x) margin. But now I found something else unexpected: doubling the length makes the JavaScript program run out of memory, while the Haskell program (using computeP), compiled to JavaScript, doesn't. I'm sure computeP is actually computing the array since it prints it, so... that is left as a mystery for me. – MaiaVictor Jan 23 '15 at 20:31
  • 1
    It's not a matter of when the array is evaluated. `arr` is not physically stored as an array regardless of how much or how little you evaluate it. – Reid Barton Jan 23 '15 at 20:47
  • 1
    If you want a physical array, you can use one of the other storage formats like `U` (instead of `D`). – Reid Barton Jan 23 '15 at 20:53
  • Yes, `computeP` produces an `U` array. – MaiaVictor Jan 23 '15 at 21:25