Reading a binary file using np.fromfile()

Question

I have a binary file that has numerous sections. Each section has its own pattern (i.e. the placement of integers, floats, and strings).

The pattern of each section is known. However, the number of times that pattern occurs within the section is unknown. Each record is in between two same integers. These integers indicate the size of the record. The section name is in between two integer record length variables: 8 and 8. Also within each section, there are multiple records (which are known).

Header
---------------------
Known header pattern
---------------------
8 Section One 8
---------------------
Section One pattern repeating i times
---------------------
8 Section Two 8
---------------------
Section Two pattern repeating j times
---------------------
8 Section Three 8
---------------------
Section Three pattern repeating k times
---------------------

Here was my approach:

Loop through and read each record using f.read(record_length), if the record is 8 bytes, convert to string, this will be the section name.

Then i call: np.fromfile(file,dtype=section_pattern,count=n)

I am calling np.fromfile for each section.

The issue I am having is two fold:

How do I determine n for each section without doing a first pass read?

Reading each record to find a section name seems rather inefficient. Is there a more efficient way to accomplish this?

The section names are always between two integer record variables: 8 and 8.

Here is a sample code, note that in this case i do not have to specify count since the OES section is the last section:

with open('m13.op2', "rb") as f:

    filesize = os.fstat(f.fileno()).st_size
    f.seek(108,1) # skip header

    while True:

        rec_len_1 = unpack_int(f.read(4))
        record_bytes = f.read(rec_len_1)
        rec_len_2 = unpack_int(f.read(4))
        record_num = record_num + 1

        if rec_len_1==8:

            tablename = unpack_string(record_bytes).strip()

            if tablename == 'OES':

                OES = [

                # Top keys
                ('1','i4',1),('op2key7','i4',1),('2','i4',1),
                ('3','i4',1),('op2key8','i4',1),('4','i4',1),
                ('5','i4',1),('op2key9','i4',1),('6','i4',1),

                # Record 2 -- IDENT
                ('7','i4',1),('IDENT','i4',1),('8','i4',1),
                ('9','i4',1),
                ('acode','i4',1),
                ('tcode','i4',1),
                ('element_type','i4',1),
                ('subcase','i4',1),

                ('LSDVMN','i4',1), # Load set number
                ('UNDEF(2)','i4',2), # Undefined
                ('LOADSET','i4',1), # Load set number or zero or random code identification number
                ('FCODE','i4',1), # Format code
                ('NUMWDE(C)','i4',1), # Number of words per entry in DATA record
                ('SCODE(C)','i4',1), # Stress/strain code
                ('UNDEF(11)','i4',11), # Undefined
                ('THERMAL(C)','i4',1), # =1 for heat transfer and 0 otherwise
                ('UNDEF(27)','i4',27), # Undefined
                ('TITLE(32)','S1',32*4), # Title
                ('SUBTITL(32)','S1',32*4), # Subtitle
                ('LABEL(32)','S1',32*4), # Label
                ('10','i4',1),

                # Record 3 -- Data
                ('11','i4',1),('KEY1','i4',1),('12','i4',1),
                ('13','i4',1),('KEY2','i4',1),('14','i4',1),
                ('15','i4',1),('KEY3','i4',1),('16','i4',1),
                ('17','i4',1),('KEY4','i4',1),('18','i4',1),
                ('19','i4',1),
                ('EKEY','i4',1), #Element key = 10*EID+Device Code. EID = (Element key)//10
                ('FD1','f4',1),
                ('EX1','f4',1),
                ('EY1','f4',1),
                ('EXY1','f4',1),
                ('EA1','f4',1),
                ('EMJRP1','f4',1),
                ('EMNRP1','f4',1),
                ('EMAX1','f4',1),
                ('FD2','f4',1),
                ('EX2','f4',1),
                ('EY2','f4',1),
                ('EXY2','f4',1),
                ('EA2','f4',1),
                ('EMJRP2','f4',1),
                ('EMNRP2','f4',1),
                ('EMAX2','f4',1),
                ('20','i4',1)]

                nparr = np.fromfile(f,dtype=OES)


            if f.tell() == filesize:
                break

Thanks for the suggestion, seems like the way to go for large files. What is the most efficient way of determining the value of `offset` and `shape` for each section? [numpy.memmap](http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html) — snowleopard, May 20 '16 at 23:19

Reading a binary file using np.fromfile()

0 Answers0