2

I have a bunch of binary files that contain data in the following format:

i\xffhh\xffhh\xffhh\xffih\xffhh\xffhh\xffhh\xffhh\xffhi\xffii\xffjj\xffjj\xffjj\xffjk\xffkk\xffkk\xffkl\xffll\xffmm\xffmn\xffnn\xffon\xffno\xffop\xffop\xffpp\xffqq\xffrq\xffrs\xffst\xfftt\xfftt\xffuv\xffvu\xffuv\xffvv\xffvw\xffwx\xffwx\xffxy\xffyy\xffyz\xffz{\xffz{\xff||\xff}|\xff~}\xff}}\xff~~\xff~~\xff~\x7f\xff\x7f\x7f\xff\x7f\x7f\xff\x7f\x7f\xff\x80\x80\xff\x80\x81\xff\x81\x80\xff\x81\x81\xff\x81\x82\xff\x82\x82\xff\x82\x82\xff\x82\x83\xff\x83\x83\xff\x83\x83\xff\x83\x84\xff\x83\x84\xff\x84\x85\xff\x85\x85\xff\x86\x85\xff\x86\x87\xff\x87\x87\xff\x87\x87\xff\x88\x87\xff\x88\x89\xff\x88\x89\xff\x89\x8a\xff\x89\x8a\xff\x8a\x8b\xff\x8b\x8b\xff\x8b\x8c\xff\x8d\x8d\xff\x8d\x8d\xff\x8e\x8e\xff\x8e\x8f\xff\x8f\x8f

These are supposed to be pressure sensor readings from when a person is walking, so I'm assuming that they are numbers, but I want to convert them into ascii so I have some idea what they are. How do I convert them? What format are they currently in?

EDIT: Link to file provided here (Link)

GobiasKoffi
  • 4,014
  • 14
  • 59
  • 66
  • 1
    This is interesting because they seem to start making sense right at the first \x7f. My guess would be that you're looking at this in an editor that is translating some binary into characters outside of the hex range (~ tilde for example, is not a hex character). Can you show us the dump from an actual hex editor, link to the file, or find out the file format but looking in the documentation for the device you're using? – marr75 Nov 18 '10 at 19:39
  • Could be 3 bytes per sample, with the middle byte always 255 for some reason. What device does the data come from? – Gareth Rees Nov 18 '10 at 19:41
  • Great! I came to this question because I want to hack the calibration file (`.cal3`) of a FootWork pressure sensor. Small world! – heltonbiker Aug 09 '11 at 15:05

3 Answers3

4

I'm absolutely shocked and stunned and not a little bit amazed at all the waffle like "you have letters like hh which shouldn't be part of a hex number" and "they seem to start making sense right at the first \x7f". Hasn't anybody seen any repr() output?

The following shows how it might have ended up like that, ignoring the \xff which seems to be just noise:

>>> pressure = [120,121,122,123,124,125,126,127,128,129,130,131]
>>> import struct
>>> some_bytes = struct.pack("12B", *pressure)
>>> print repr(some_bytes)
'xyz{|}~\x7f\x80\x81\x82\x83'
>>>

So let's try working back from the file:

>>> guff = open('your_file.bin', 'rb').read()
>>> cleaned = guff.replace("\xff", "")
>>> cleaned
'ihhhhhhihhhhhhhhhhiiijjjjjjjkkkkkklllmmmnnnonnoopopppqqrqrsstttttuvvuuvvvvwwxwx
xyyyyzz{z{||}|~}}}~~~~~\x7f\x7f\x7f\x7f\x7f\x7f\x7f\x80\x80\x80\x81\x81\x80\x81\
x81\x81\x82\x82\x82\x82\x82\x82\x83\x83\x83\x83\x83\x83\x84\x83\x84\x84\x85\x85\
x85\x86\x85\x86\x87\x87\x87\x87\x87\x88\x87\x88\x89\x88\x89\x89\x8a\x89\x8a\x8a\
x8b\x8b\x8b\x8b\x8c\x8d\x8d\x8d\x8d\x8e\x8e\x8e\x8f\x8f\x8f'
# Note that lines wrap at column 80 in a Windows "Command Prompt" window ...
>>> pressure = [ord(c) for c in cleaned]
>>> pressure
[105, 104, 104, 104, 104, 104, 104, 105, 104, 104, 104, 104, 104, 104, 104, 104,
 104, 104, 105, 105, 105, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107, 107,
 107, 107, 108, 108, 108, 109, 109, 109, 110, 110, 110, 111, 110, 110, 111, 111,
 112, 111, 112, 112, 112, 113, 113, 114, 113, 114, 115, 115, 116, 116, 116, 116,
 116, 117, 118, 118, 117, 117, 118, 118, 118, 118, 119, 119, 120, 119, 120, 120,
 121, 121, 121, 121, 122, 122, 123, 122, 123, 124, 124, 125, 124, 126, 125, 125,
 125, 126, 126, 126, 126, 126, 127, 127, 127, 127, 127, 127, 127, 128, 128, 128,
 129, 129, 128, 129, 129, 129, 130, 130, 130, 130, 130, 130, 131, 131, 131, 131,
 131, 131, 132, 131, 132, 132, 133, 133, 133, 134, 133, 134, 135, 135, 135, 135,
 135, 136, 135, 136, 137, 136, 137, 137, 138, 137, 138, 138, 139, 139, 139, 139,
 140, 141, 141, 141, 141, 142, 142, 142, 143, 143, 143]
>>>

You'll still need to read the docs for the equipment to find out what is the scale factor to multiple those 0-254 values by.

You'll notice that the derived numbers change by +1, 0, or -1 each time. This fits comfortably with a hypothesis that it's only 1 byte per reading, rather than two or more bytes per reading.

Another thought: perhaps the \xff is a start or end sentinel, and there are two values (start, stop) or (sensor-A, sensor-B) being reported each cycle.

John Machin
  • 81,303
  • 11
  • 141
  • 189
3

You can not guess the format by just opening up a binary file. You will have to get the information on the way data is stored for that particular pressure sensor readings.

Of course, when you know the format, it is easy to read the file in binary mode and then get all the meaningful data from it

FILE = open(filename,"rb")
FILE.read(numBytes)
pyfunc
  • 65,343
  • 15
  • 148
  • 136
  • 1
    ...and for decoding the data after reading, you can use struct module: http://docs.python.org/library/struct.html – che Nov 18 '10 at 19:38
  • 1
    That's an answer? You get 45 Zorkmids for that? – John Machin Dec 16 '10 at 04:14
  • Vincent wrote: i got an error numBytes is not defined. Response: NumBytes is an integer representing number of bytes to read. If the file is small, you can leave this blank to read the full file. – cadvena Jun 27 '16 at 11:38
0

The first part looks very strange. Typically a number like \x8e is just a code for being a byte in hex, except in the first part you have letters like hh which shouldn't be part of a hex number.

But for the second part you can do something like:

hex_list = r"\x7f\xff\x7f\x7f\xff\x7f\x7f\xff\x7f\x7f\xff\x80\x80\xff\x80\x81\xff\x81\x80\xff\x81\x81\xff\x81\x82\xff\x82\x82\xff\x82\x82\xff\x82\x83\xff\x83\x83\xff\x83\x83\xff\x83\x84\xff\x83\x84\xff\x84\x85\xff\x85\x85\xff\x86\x85\xff\x86\x87\xff\x87\x87\xff\x87\x87\xff\x88\x87\xff\x88\x89\xff\x88\x89\xff\x89\x8a\xff\x89\x8a\xff\x8a\x8b\xff\x8b\x8b\xff\x8b\x8c\xff\x8d\x8d\xff\x8d\x8d\xff\x8e\x8e\xff\x8e\x8f\xff\x8f\x8f"
int_list =  [int(hex,16) for hex in hex_list.replace('\\', ';0').split(';') if hex != '']

Note you always get a number between 127 and 143, except for the 255 (the \xff).

dr jimbob
  • 17,259
  • 7
  • 59
  • 81