Binary processing speed with JS dialect

Question

Working with Adobe's dialect of Javascript called ExtendScript (reportedly based on ECMA-262, ed.3/ISO/IEC 16262), I aim to process binary files. ES does not support ArrayBuffer, and so I read parts of the files into a string and use toCharCode to access byte, integer, and long values.

However, that comes with a severe speed penalty. Reading 215,526 items (mixed byte, word, and long), I get this performance:

charCodeAt 29,368 ms (note there is a significant jitter of +/-5% in my timings due to random disk read fluctuations)

String.prototype.uwordValueAt = function(index)
{
   index = index || 0;
   return this.charCodeAt(index)+(this.charCodeAt(index+1)<<8);
}

(and similar functions for byteValueAt and longValueAt).

I tried replacing charCodeAt with a direct look-up this way:

var ascToBin = {};
var ascToBinH = {};
for (i=0; i<256; i++)
{
    ascToBin[String.fromCharCode(i)] = i;
    ascToBinH[String.fromCharCode(i)] = i<<8;
}

so I could use this instead:

String.prototype.wordValueAt = function(index)
{
    index = index || 0;
    return ascToBin[this[index]]^ascToBinHS[this[index+1]];
}

with the following result:

ascTobin lookup: 29,528 ms

Hardly significant -- sometimes it is slightly faster due to timing jittering. Leaving out the index dummy check doesn't make a significant impact.

The read algorithm itself cannot be easily improved: the data consists of pointers to yet more data, and as far as I can tell all data is read only once. The data is non-sequentially stored, but I take care to read as large as possible buffers (short of reading the entire file at once). At any rate, I don't believe disk access is a real bottleneck, as the pointers-to-data and its associated data are packed together in 1/2K chunks (of which there are 284, totalling 11,616 individual data packets, in this particular worst-case file).

A typical large-ish file loads in 3.5 seconds, which is OK, but I'd still like to strip out every possible nanosecond. Is there a better alternative to using String and charCodeAt?

score 3 · Accepted Answer · answered Nov 18 '13 at 09:58

3

No and it is not the charCodeAt method that is slow but the implementation.

If it's possible to use other implementation or language and implementation, you should do that.

answered Nov 18 '13 at 09:58

Esailija

138,174
23
272
326

The implementation of my script in particular, or of ExtendScript in general? I agree using a "real" language would be faster -- Adobe's DOM can be accessed through C++ -- but ExtendScript is (mostly) version and platform agnostic, which is a fairly important plus for me. – Jongware Nov 18 '13 at 10:38
3

@Jongware A language is not fast or slow. In many implementations like SpiderMonkey and V8 the optimizing compiler can compile `.charCodeAt()` into very fast code. But obviously the implementation you are using is not doing that. – Esailija Nov 18 '13 at 10:41

Binary processing speed with JS dialect

1 Answers1