Format/encoding of this binary data file

Question

I'm attempting to write a program that integrates with Advent Axys, software for financial planners and the like. The product's site is here: http://www.advent.com/solutions/asset-managers-software/axys-platform

I need to write new entries into the price files, but much of them are binary. I looked around online and didn't find much, and I emailed their support, but I doubt it will help.

I have a short dummy file and the printout that the program gives to said file. I ran the file through a ruby script that prints the character if it is a word character or symbol and the ASCII val otherwise. Here's the Ruby script:

pri = File.read '062109_dummy.pri'
pri.each_byte do |char|
  print char.chr =~ /[\w!@#\$%\^&\*\(\)\-\\\/\+\.]/ ? char.chr : ' ' + char.to_s + ' '
end

And output:


pri1.001 254  250  251  252  29  0  0  2 adusnok 0  0  0  0  0  0  0  0  0 33333s7@ 1  254  250  251  252  29  0  0  2 csusxom 0  0  0  0  0  0  0  0  0 H 225 z 20  174 GA@ 1  254  250  251  252  29  0  0  2 etusvv 0  0  0  0  0  0  0  0  0  0  246 (\ 143  194  213 F@ 1  254  250  251  252  29  0  0  2 fdusoakbx 0  0  0  0  0  0  0  174 G 225 z 20  174 (@ 1  254  250  251  252  29  0  0  2 oousfidde09 0  0  0  0  0  154  153  153  153  153  185 S@ 1  254  250251  252  29  0  0  2 qpusfid_eqix 0  0  0  0  164 p 61  10  215 cL@ 1  254  250  251  252  29  0  0  2 vausvg_sc 0  0  0  0  0  0  0 )\ 143  194  245  248 P@ 1

Note that if a number has spaces around it, that means it's the value of the byte, and if it doesn't, then the value of the byte was the ASCII representation of that number.

I know that the strings of letters (like "adusnok") are the representations of the stocks and the like. Then there are 0-ed bits because the space for the symbols are fixed-size (which is why there are fewer 0's after a longer symbol). The sequence @ 1 254 250 251 252 29 0 0 2 seems to signify the end of a record, coming right before the symbol for a new one. Alternatively, some of it could signify something that is the same for all of these, but not much seems the same. After that, I know basically nothing. I do have the printout of what the program thinks that maps to. With 3 spaces separating each column, it is:

adus   nok   23.45   NOKIA CORP ADR   0.393  05/30/2008
csus   xom   34.56   EXXON MOBIL CORPORATION COM   1.68   06/10/2009
etus   vv    45.67   VANGUARD LRG CAP ETF US PRIME MKT 750   1.04   3/31/2009

There's more, but that should give you a pretty good idea. I think it's quite possible that the Descriptions, and possible other things, are stored in other files and just looked up. But I know that the prices are in that file, because these are price files and that's the whole point. So:

33333s7 => 23.45 H225 z 20 174 GA => 34.56 246 (\ 143 194 213 F => 45.67

Note that save the 3's and 7's in the first one, all of the numbers there are values of the bytes, not the ASCII representations of the values. Also note that those values could represent a little more than just the price, but they definitely represent the price.

Any ideas? I'm not familiar with common binary encodings, but I wouldn't be surprised if they used a fairly common method.

score 3 · Answer 1 · answered Jun 25 '09 at 20:15

Reverse engineering a binary format is dangerous if you are going to ship your reverse engineered codec. They may change the file format w/o warning. However, if you are bound and determined to do it:

One thing you could do is to look at the format for IEEE floating point numbers:

http://steve.hollasch.net/cgindex/coding/ieeefloat.html

And then, starting at the first byte in the file, read 4 or 8 bytes of data. Convert both sets (4 bytes and 8 bytes) to float and double values. Check to see if they match the values that you know are in the file. If so, you have probably found the offset of a price. Print it out, plus the offset. If not, increment your seek by one byte and try again.

If you can find all the values that way, then you might be able to safely patch the binary files at runtime by performing a similar operation: looking for the prices that you know are there, and then modifying the price values in the right place.

This isn't foolproof at all, because random sequences of data will sometimes match up. If you notice a definite distance between offsets, or some sigil that is always present, or perhaps even better, if you can find those offset values back in the file, you may have something modestly stable.

Format/encoding of this binary data file

1 Answers1