0

I have a .raw file containing a 52 lines html header followed by the data themselves. The file is encoded in little-endian 24bits SIGNED and I want to convert the data to integers in an ASCII file. I use Python 3.

I tried to 'unpack' the entire file with the following code found in this post:

import sys
import chunk
import struct

f1 = open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb')
data = struct.unpack('<i', chunk + ('\0' if chunk[2] < 128 else '\xff'))   

But I get this error message:

TypeError: 'module' object is not subscriptable

EDIT

It seems this is better:

data = struct.unpack('<i','\0'+ bytes)[0] >> 8

But I still get an error message:

TypeError: must be str, not type

Easy to fix I presume?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
ananas
  • 3
  • 3
  • Can you post the result of `f1.read()`? – Tomalak Jul 27 '17 at 09:07
  • 1
    1) Screen dumps are not welcome here: large storage space, no re-use, not searchable 2) The problem is the *chunk* module. Probably a name collision between the module name and your chosen instance variable. Or your forgot to instantiate something with the *Chunk* class at all? – guidot Jul 27 '17 at 09:17
  • You need to split the binary data from the HTML first. Don't use `bytes` as a variable name as it conflicts with Python's own `bytes` type – Alastair McCormack Jul 27 '17 at 10:03
  • @guidot: screenshot removed. The 'struct.unpack() ' would work for 16 or 32 bits but not for 24 bits. The code in Serge Ballesta's answer works perfectly. – ananas Jul 29 '17 at 11:02

1 Answers1

0

That's not a nice file to process in Python! Python is great for processing text files, because it reads them in big chunks in an internal buffer and then iterates on lines, but you cannot easily access binary data that comes after text read like that. Additionally, the struct module has no support for 24 bits values.

The only way I can imagine is to read the file one byte at a time, first skip 52 time an end of line, then read bytes 3 at a time, concatenate them in a 4 bytes byte string and unpack it.

Possible code could be:

eol = b'\n'          # or whatever is the end of line in your file
nlines = 52          # number of lines to skip

with open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') as f1:

    for i in range(nlines):       # process nlines lines
        t = b''                   # to store the content of each line
        while True:
            x = f1.read(1)        # one byte at a time
            if x == eol:          # ok we have one full line
                break
            else:
                t += x            # else concatenate into current line
        print(t)                  # to control the initial 52 lines

    while True:
        t = bytes((0,))               # struct only knows how to process 4 bytes int
        for i in range(3):            # so build one starting with a null byte
            t += f1.read(1)
        # print(t)
        if(len(t) == 1): break        # reached end of file
        if(len(t) < 4):               # reached end of file with uncomplete value
            print("Remaining bytes at end of file", t)
            break
        # the trick is that the integer division by 256 skips the initial 0 byte and keeps the sign
        i = struct.unpack('<i', t)[0]//256   # // for Python 3, only / for Python 2
        print(i, hex(i))                     # or any other more useful processing

Remark: above code assumes that your description of 52 lines (terminated by an end of line) is true, but the shown image let think that last line is not. In that case, you should first count 51 lines and then skip the content of the last line.

def skipline(fd, nlines, eol):
    for i in range(nlines):       # process nlines lines
        t = b''                   # to store the content of each line
        while True:
            x = fd.read(1)        # one byte at a time
            if x == eol:          # ok we have one full line
                break
            else:
                t += x            # else concatenate into current line
        # print(t)                  # to control the initial 52 lines

with open('/Users/anais/Documents/CR_lab/Lab_files/labtest.raw', mode = 'rb') as f1:
    skiplines(f1, 51, b'\n')     # skip 51 lines terminated with a \n
    skiplines(f1, 1, b'>')       # skip last line assuming it ends at the >

    ...
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252
  • Thanks a lot for your answer accompanied by a detailed explanation, which was necessary for me since I just start with programming. I have cross-checked the results with a Matlab code and no surprise, it works perfectly ! Thanks again !! – ananas Jul 29 '17 at 10:54