1

I am trying to port a JS algorithm to C++, to see if I can improve the perfs, but I'm facing a huge performance bottleneck on populating v8 arrays.

Here is a snippet that reproduce just the array populating. I create an array of 800k items, each item being an array of 17 numbers. This algo takes 3secs to execute on my machine, which is quite huge.

Is there anyway to speed it up?

#include <node.h>

namespace demo {

  using namespace v8; // just for lisibility of the example

  void Method(const FunctionCallbackInfo<Value>& args) {
    Isolate* isolate = args.GetIsolate();
    Local<Array> array = Array::New(isolate, 800000);

    for (int i = 0; i < 800000; ++i) {
      Local<Array> line = Array::New(isolate, 17);

      for (int j = 0; j < 17; ++j) {
        line->Set(j, Number::New(isolate, i * 100 + j));
      }

      array->Set(i, line);
    }

    args.GetReturnValue().Set(array);
  }

  void Init(Local<Object> exports) {
    NODE_SET_METHOD(exports, "hello", Method);
  }

  NODE_MODULE(parser, Init)

}
ZachB
  • 13,051
  • 4
  • 61
  • 89
Aurelien Ribon
  • 7,548
  • 3
  • 43
  • 54
  • Did you compare this to creating the same arrays in javascript? Currently it can be faster to create some things in javascript and pass them to C++. – mscdex Mar 21 '16 at 10:36
  • Indeed, but the real algorithm is actually a CSV parser for big data, so the arrays lengths are not known when starting the algo. I actually used some shortcuts in this example, since in reality I cannot pre-allocate the `array` variable to a known size. – Aurelien Ribon Mar 21 '16 at 16:58

2 Answers2

1

Creating JS objects (and interacting with them) from C++ is more expensive than doing it from JS. This can easily offset performance gains from the rest of the C++ code.

You can work around this by communicating via a Buffer (the serialization overhead will typically be lower than the above). More importantly, this will also let you do the work off the main v8 thread.

If you're only dealing with numbers, this should be relatively straightforward using Buffer.readIntLE (or similar methods). You could also encode the array's length into the first few bytes of the buffer. Here's what the JS side of things could look like:

var buf = new Buffer(/* Large enough to contain your serialized data. */);

// Function defined in your C++ addon.
addon.populate(buf, function (err) {
  if (err) {
    // Handle C++ error.
    return;
  }
  // At this point, `buf` contains the serialized data. Deserialization
  // will depend on the chosen serialization format but a reasonable
  // option could be the following:
  var arr = [];
  var pos = 4;
  var size = buf.readInt32LE(0);
  while (size--) {
    var subarr = new Array(17);
    for (var i = 0; i < 17; i++) {
      subarr[i] = buf.readInt32LE(pos);
      pos += 4;
    }
    arr.push(subarr);
  }
  // `arr` now contains your decoded data.
});

The C++ part of the code would keep a reference to buf's data (a char *) and populate it inside a worker thread (see nan's AsyncWorker for a convenient helper).

mtth
  • 4,671
  • 3
  • 30
  • 36
0

As mtth said, working with JS arrays in C++ is expensive. Using a buffer would work, but you can also use TypedArrays. These are accessible from C++ as pointers to contiguous, aligned blocks of memory, which makes them easy to work with and fast to iterate over.

See https://stackoverflow.com/a/31712512/1218408 for some info on how to access their contents.

Community
  • 1
  • 1
ZachB
  • 13,051
  • 4
  • 61
  • 89