I created a parser for some complex binary files using numpy.fromfile
and defining the various dtypes necessary for reading each portion of the binary file. The resulting numpy array was then placed into a pandas dataframe and the same dtype that was defined for converting the binary files into the numpy array was recycled to define the column names for the pandas dataframe.
I was hoping to replicate this process using python struct
but ran into an issue. If part of my structure requires a value to be a group of 3 ints, I can define the dtype as numpy.dtype([('NameOfField', '>i4', 3)])
and the returned value from the binary file is [int, int, int]
. Can this be replicated using struct or do I need to regroup the values in the returned tuple based on the dtype before ingesting it into my pandas dataframe ?? I have read the python struct documentation and have not noticed any examples of this.
When using a dtype of >3i
returns a result of int, int, int
instead of [int, int, int]
like I need.
Edit ...
Below is a generic example. This method using numpy.fromfile
works perfect but is slow when working on my huge binary files so I am trying to implement using struct
import numpy as np
import pandas as pd
def example_structure():
dt = np.dtype([
('ExampleFieldName', '>i4', 3)
])
return dt
# filename of binary file
file_name = 'example_binary_file'
# define the dtype for this chunk of binary data
d_type = example_structure()
# define initial index for the file in memory
start_ind = 0
end_ind = 0
# read in the entire file generically
x = np.fromfile(file_name, dtype='u1')
# based on the dtype find the chunk size
chunk_size = d_type.itemsize
# define the start and end index based on the chunk size
start_ind = end_ind
end_ind = chunk_size + start_ind
# extract just the first chunk
temp = x[start_ind:end_ind]
# cast the chunk as the defined dtype
temp.dtype = d_type
# store the chunk in its own pandas dataframe
example_df = pd.DataFrame(temp.tolist(), columns=temp.dtype.names)
This will return a temp[0]
value of [int, int, int]
that will then be read into the pandas dataframe as a single entry under the column ExampleFieldName
. If I attempt to replicate this using struct
the temp[0]
value is int, int, int
, which is not be read properly into pandas. Is there a way to make struct
group values like I can do using numpy
??