1

I have two variables data and meta, which I am saving in a compressed .mat file (version '-v7'). The data variable is usually 800mb uncompressed, while meta might not even be 1mb. I have lots of these .mat files and sometimes I just need to loop through all the meta variables. However, since the file is compressed, it still takes lots of time to load the meta variable alone , i.e. same time as if I were to load both variables.

Is it possible to selectively compress specific variables in a .mat file (in order to speed up the loading) ? Alternative data designs?

Note : I already have an overall single meta which is basically the concatenation of the smaller ones. However, I will need to abandon this approach because it does not scale well size-wise and performance-wise.

Oleg
  • 10,406
  • 3
  • 29
  • 57
  • Not sure if this really does what you'd want in terms of performance, but have you given a try to `matfile()`? Unfortunately I cannot test right now... but here's the help link: http://www.mathworks.com/help/matlab/ref/matfile.html?searchHighlight=matfile **NB** I am aware that this is not what you asked. I'm just suggesting an alternative approach. :-) –  Jun 25 '14 at 19:33
  • I kept reading around from your link, and in a very uninformative line, the documentation says that depending how the .mat file is chunked, it should take less to load single variables. Now, looking at the low-level functions in the HDF5 api I can see there are ways to set the layout, and maybe a smart alternative to chunk the `meta` alone, so that the decompression of a single chink is performed. – Oleg Jun 25 '14 at 19:47
  • In [Speeding up save and load operations](http://www.mathworks.co.uk/help/matlab/import_export/mat-file-versions.html#br_4ten) the docs mention about the chunked profile of the mat files. Anybody has experience with setting the chunk profile? – Oleg Jun 25 '14 at 19:55
  • Again, not sure if you didn't check these out already, but here's a link with typical examples on using the HDF5 API in MATLAB: http://www.hdfgroup.org/HDF5/examples/api18-m.html –  Jun 26 '14 at 08:42
  • So, `matfile()` does not improve the loading speed (but it also depends on the setup, I am currently testing with an SSD). From my previous link, it seemed possible to use low level HDF5 API directly on .mat files, but it is not the case. I do not have the time nor the intention to learn/rewrite everything with pure hdf5 (mat files being based on that). I have not mentioned that `meta` is a `dataset`, and this might impact on the way it is stored, i.e. not contiguously, forcing to unzip/load many chunks. I will test serialization of the `dataset`, before saving it. – Oleg Jun 26 '14 at 22:27

1 Answers1

0

To save selected variables to myData.mat, use:

save myData var1 var2 var3 var4 var5

If you want to load var2 from myData file, use the following command:

load myData var2
  • 2
    This is exactly what I am doing. However, it seems that loading the whole file or just the much smaller `meta` has almost no advantage in terms of time. For this reason, I speculate that the smaller variable is stored in such a way that forces the whole file to be uncompressed first, and not selectively only the partition where `meta` resides. – Oleg Aug 16 '14 at 08:58