I'm trying to convert a 4x4, 5.6.5.0.0, .bmp file into a list of rgb values to plug into another program that needs a specific format, and I'm getting stuck because I think the read() method in Python is converting some of the data before I can use it, even when I open it in "rb" mode.
For example, when i use:
f = open("imgFile.bmp", "rb")
imgData=f.read()
f.close()
print imgData
I get:
BMh\x00\x00\x00\x00\x00\x00\x006\x00\x00\x00(\x00\x00\x00\x04\x00\x00\x00\xfc\xff\xff\xff\x01\x00\x18\x00\x00\x00\x00\x002\x00\x00\x00\x12\x0b\x00\x00\x12\x0b\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xcc\xbb\xaa\xff\xee\xdd\x00\x00\x00\xff\xff\xff\xdd\xcc\xbb\x00\x00\x00\xff\xff\xff\x00\x00\x00\x00\x00\x00\xff\xff\xff\x00\x00\x00\xff\xff\xff\xff\xff\xff\x00\x00\x00\xff\xff\xff3"\x11\x00\x00
Which is fine for the most part (I can grab the hex values I need after the bmp header—those values start at "\xcc\xbb\xaa . . ." But it looks like some hex values are being interpreted as other characters and symbols, which at least make it harder to translate, but at worst result in ambiguity that makes it impossible to recover the original data with certainty.
For instance, you'll find this sequence near the end of the string:
\xff3"\x11
which should appear as:
\xff\x33\x22\x11
(This table shows that '33' can be interpreted as '3', '22' as '"', and I'm certain that it should be that way—see how the data appears in the text editor below).
Now, it would be easy to translate all the symbols back into the hex format if there were no ambiguities, but there are many possibilities in more complex files. For instance, if I have the sequence '6666' it would just be changed into 'ff', which I would be unable to tell appart from instances of 'ff' that I might already have in my data.
My question is: how do I keep the data untranslated and unambiguous for further parsing and formatting in Python?
To confirm that what I've described is happening, I've opened the file in SublimeText, where it appears as this:
424d 6800 0000 0000 0000 3600 0000 2800 0000 0400 0000 fcff ffff 0100 1800 0000 0000 3200 0000 120b 0000 120b 0000 0000 0000 0000 0000 ccbb aaff eedd 0000 00ff ffff ddcc bb00 0000 ffff ff00 0000 0000 00ff ffff 0000 00ff ffff ffff ff00 0000 ffff ff33 2211 0000
, which is correct and usable (though not efficient for my purposes, to have to open in a text editor every time), so i would like to automate the process with Python.
Incidentally, I think this may be what was happening for this person, too.