Working with Adobe's dialect of Javascript called ExtendScript (reportedly based on ECMA-262, ed.3/ISO/IEC 16262), I aim to process binary files. ES does not support ArrayBuffer, and so I read parts of the files into a string and use toCharCode
to access byte, integer, and long values.
However, that comes with a severe speed penalty. Reading 215,526 items (mixed byte, word, and long), I get this performance:
charCodeAt 29,368 ms (note there is a significant jitter of +/-5% in my timings due to random disk read fluctuations)
String.prototype.uwordValueAt = function(index)
{
index = index || 0;
return this.charCodeAt(index)+(this.charCodeAt(index+1)<<8);
}
(and similar functions for byteValueAt
and longValueAt
).
I tried replacing charCodeAt
with a direct look-up this way:
var ascToBin = {};
var ascToBinH = {};
for (i=0; i<256; i++)
{
ascToBin[String.fromCharCode(i)] = i;
ascToBinH[String.fromCharCode(i)] = i<<8;
}
so I could use this instead:
String.prototype.wordValueAt = function(index)
{
index = index || 0;
return ascToBin[this[index]]^ascToBinHS[this[index+1]];
}
with the following result:
ascTobin lookup: 29,528 ms
Hardly significant -- sometimes it is slightly faster due to timing jittering. Leaving out the index
dummy check doesn't make a significant impact.
The read algorithm itself cannot be easily improved: the data consists of pointers to yet more data, and as far as I can tell all data is read only once. The data is non-sequentially stored, but I take care to read as large as possible buffers (short of reading the entire file at once). At any rate, I don't believe disk access is a real bottleneck, as the pointers-to-data and its associated data are packed together in 1/2K chunks (of which there are 284, totalling 11,616 individual data packets, in this particular worst-case file).
A typical large-ish file loads in 3.5 seconds, which is OK, but I'd still like to strip out every possible nanosecond. Is there a better alternative to using String and charCodeAt
?