0

Working with Adobe's dialect of Javascript called ExtendScript (reportedly based on ECMA-262, ed.3/ISO/IEC 16262), I aim to process binary files. ES does not support ArrayBuffer, and so I read parts of the files into a string and use toCharCode to access byte, integer, and long values.

However, that comes with a severe speed penalty. Reading 215,526 items (mixed byte, word, and long), I get this performance:

charCodeAt 29,368 ms (note there is a significant jitter of +/-5% in my timings due to random disk read fluctuations)

String.prototype.uwordValueAt = function(index)
{
   index = index || 0;
   return this.charCodeAt(index)+(this.charCodeAt(index+1)<<8);
}

(and similar functions for byteValueAt and longValueAt).

I tried replacing charCodeAt with a direct look-up this way:

var ascToBin = {};
var ascToBinH = {};
for (i=0; i<256; i++)
{
    ascToBin[String.fromCharCode(i)] = i;
    ascToBinH[String.fromCharCode(i)] = i<<8;
}

so I could use this instead:

String.prototype.wordValueAt = function(index)
{
    index = index || 0;
    return ascToBin[this[index]]^ascToBinHS[this[index+1]];
}

with the following result:

ascTobin lookup: 29,528 ms

Hardly significant -- sometimes it is slightly faster due to timing jittering. Leaving out the index dummy check doesn't make a significant impact.

The read algorithm itself cannot be easily improved: the data consists of pointers to yet more data, and as far as I can tell all data is read only once. The data is non-sequentially stored, but I take care to read as large as possible buffers (short of reading the entire file at once). At any rate, I don't believe disk access is a real bottleneck, as the pointers-to-data and its associated data are packed together in 1/2K chunks (of which there are 284, totalling 11,616 individual data packets, in this particular worst-case file).

A typical large-ish file loads in 3.5 seconds, which is OK, but I'd still like to strip out every possible nanosecond. Is there a better alternative to using String and charCodeAt?

Jongware
  • 22,200
  • 8
  • 54
  • 100

1 Answers1

3

No and it is not the charCodeAt method that is slow but the implementation.

If it's possible to use other implementation or language and implementation, you should do that.

Esailija
  • 138,174
  • 23
  • 272
  • 326
  • The implementation of my script in particular, or of ExtendScript in general? I agree using a "real" language would be faster -- Adobe's DOM can be accessed through C++ -- but ExtendScript is (mostly) version and platform agnostic, which is a fairly important plus for me. – Jongware Nov 18 '13 at 10:38
  • 3
    @Jongware A language is not fast or slow. In many implementations like SpiderMonkey and V8 the optimizing compiler can compile `.charCodeAt()` into very fast code. But obviously the implementation you are using is not doing that. – Esailija Nov 18 '13 at 10:41