0

I have a data set consisting of large number of .mat files. Each .mat file is of considerable size i.e. loading them is time-consuming. Unfortunately, some of them are corrupt and load('<name>') returns error on those files. I have implemented a try-catch routine to determine which files are corrupt. However, given the situation that only handful of them are corrupt, loading each file and checking if it is corrupt is time taking. Is there any way I can check the health of a .mat file without using load('<name>')?

I have been unsuccessful in finding such solution anywhere.

user3342981
  • 85
  • 2
  • 7

1 Answers1

3

The matfile function is used to access variables in MAT-files, without loading them into memory. By changing your try-catch routine to use matfile instead of load, you reduce the overhead of loading the large files into the memory.

As matfile appears to only issue a warning when reading a corrupt file, you'll have to check if this warning was issued. This can be done using lastwarn: clear lastwarn before calling matfile, and check if the warning was issued afterwards:

lastwarn('');
matfile(...);
[~, warnId] = lastwarn;
if strcmp(warnId, 'relevantWarningId')
    % File is corrupt
end

You will have to find out the relevant warning id first, by running the above code on a corrupt file, and saving the warnId.

A more robust solution would be to calculate a checksum or hash (e.g. MD5) of the file upon creation, and comparing this checksum before reading the file.

hbaderts
  • 14,136
  • 4
  • 41
  • 48
  • I confirmed with a corrupt MAT file that `matfile()` is not able to detect corruption. Only when you access the variables inside the `matfile` object, it gives an error. So, simply using `matfile()` inside try-catch will not work because try-catch catches only error. But the good news is that `matfile()` does give a warning when reading a corrupt file. If we can find a way to make try-catch routine catch warnings, it is the best option. Also, I find your aliter solution (using hash value) quite robust. Thanks for your answer. – user3342981 Jul 13 '16 at 14:57
  • Thanks for the feedback. I extended my answer with a solution to "catch" the warning. – hbaderts Jul 13 '16 at 15:07