2

I'm using the following code to try to load a MAT file in Python. I can load it without issue in MATLAB.

from scipy.io import loadmat
test_filename = 'test_data.mat' #This is a struct
data =loadmat(test_filename, struct_as_record=True)

Running that code produces this error:

Traceback (most recent call last):
  File "C:\Users\mac389\workspace\nexUtils\src\qA.py", line 16, in <module>
data =loadmat(test_filename, struct_as_record=True)
  File "C:\Python27\lib\site-packages\scipy\io\matlab\mio.py", line 175, in loadmat
matfile_dict = MR.get_variables(variable_names)
  File "C:\Python27\lib\site-packages\scipy\io\matlab\mio5.py", line 272, in get_variables
hdr, next_position = self.read_var_header()
  File "C:\Python27\lib\site-packages\scipy\io\matlab\mio5.py", line 224, in read_var_header
stream = BytesIO(dcor.decompress(data))
MemoryError

For reference, test_data.mat is a structure with the following fields (from MATLAB console):

 version: 101
 comment: 'molecular layer 4/17'
    freq: 40000
    tbeg: 0
    tend: 1.3950e+003
  events: {3x1 cell}
 neurons: {50x1 cell}
   waves: {102x1 cell}
contvars: {64x1 cell}

Test_data.mat is 217 MB. I have 4 GB of RAM. I am using SciPy 0.10.0 and NumPy 1.6.1. Changing the 'struct_as_record' field does nothing.

How can I load a struct where the fields are cell arrays?

Amro
  • 123,847
  • 25
  • 243
  • 454
mac389
  • 3,004
  • 5
  • 38
  • 62
  • You ran out of memory. Whilst the file may be only 200MB, the in-memory requirements could well be larger. – David Heffernan Jun 15 '12 at 22:36
  • Is the .mat file compressed? Is you Python process 32 bit? Is your MATLAB process 64 bit? – David Heffernan Jun 15 '12 at 22:45
  • @David: The mat file is not further compressed beyond matlab's binary format. Both processes are 32 bit. My question is why loading a small file eats up the memory in Python but not matlab and what I can do to circumvent it. – mac389 Jun 15 '12 at 23:18
  • @David I think, based on the stack trace, that loadmat gets stuck reading the struct field names. It may not just be a "file too big problem". – mac389 Jun 15 '12 at 23:22

2 Answers2

3

I found the answer.

Loadmat can't deal with heavily nested structures. In the data set I was given, three of the struct fields, 'waves, neurons, contvars' were cell arrays. Each member of that cell array was a struct. Some of the fields of those structs were themselves cell arrays. Those cell arrays had one field that contained the data. This nonstandard way of organizing the data combined with a lack of documentation created the problem.

I guess this serves as a cautionary tale to stick as close to text file format as possible, if you are creating the data storage format, If you choose a really nonstandard format take mercy on your successor and document that fact...

mac389
  • 3,004
  • 5
  • 38
  • 62
1

I think it takes more memory in Python because of the way decompression is implemented. Try saving in Matlab without compression (by using -v6, the version 6 format has no compression feature).

robince
  • 10,826
  • 3
  • 35
  • 48
  • That will help identify if the problem is memory. Thanks for the helpful diagnostic suggestion. But, what do to then? – mac389 Jun 16 '12 at 14:36
  • 1
    Well there is not much you can do. If it works you can stick to using v6 on the Matlab side (you can set it as the default save format). Otherwise you will need to upgrade memory on the Python side and switch to 64 bit. Or you could look at using a different format. Matlab can write to HDF5, which is well supported in Python, so if you can transform your data to fit in a HDF5 container maybe that would work for you. – robince Jun 17 '12 at 16:09