3

I have a .mat file named "myfile.mat" that contains a huge varible data and, in some cases, another variable data_info. What is the fastest way to check if that .mat file contains the `data_info' variable?

the who or whos commands are not faster than simply loading and testing for the existens of varible.

nRuns=10;
%simply loading the complete file
tic
for p=1:nRuns
    load('myfile.mat');
    % do something with variable
    if exist('data_info','var')
        %do something
    end
end
toc

% check with who
tic
for p=1:nRuns
   variables=who('-file','myfile.mat');
   if ismember('data_info', variables)
       % do something
   end
end
toc

% check with whose
tic
for p=1:nRuns
   info=whos('-file','myfile.mat');
   if ismember('data_info', {info.name})
       %do something
   end
end
toc

All methods roughly take the same time (which is way to slow, since data is huge.

However, this is very fast:

tic
for p=1:nRuns
    load('myfile.mat','data_info');
    if exist('data_info', 'var')
        %do something
    end
end
toc

But it issues a warning, if data_info does not exist. I could suppress the warning, but that doesn't seem like the best way to do this.. What other options are there?

Edit using who('-file', 'myfile.mat', 'data_info') is also not faster:

tic
for p=1:nRuns
    if ~isempty(who('-file', 'myfile.mat', 'data_info'))
      % do something
    end
end
toc    % this takes 7 seconds, roughly the same like simply loading complete .mat file
Anton Rodenhauser
  • 441
  • 1
  • 3
  • 11
  • [`matfile`](https://www.mathworks.com/help/matlab/ref/matfile.html) perhaps? – sco1 Aug 01 '17 at 15:43
  • 1
    Note that for proper timings `tic/toc` is inaccurate; it's better to use [`timeit`](http://mathworks.com/help/matlab/ref/timeit.html) instead. – Adriaan Aug 01 '17 at 15:46
  • @excaza: That does better than loading the whole file, but still not as good as `whos/who`. See the timing results in my answer. – gnovice Aug 01 '17 at 16:51

3 Answers3

4

Try using who restricting it to only the specific variable:

...
if ~isempty(who('-file', 'myfile.mat', 'data_info'))
  %do something
end

Timing the solutions:

Using timeit on the different solutions (code included below, running on Windows 7 and MATLAB version R2016b) shows that the who-based ones appear fastest, with the one I suggested above having a slight edge in speed. Here's the timing, from slowest to fastest:

Load whole file:        0.368235871921381 sec
Using matfile:          0.001973860748417 sec
Load only `data_info`:  0.000316989486384 sec
Using whos + ismember:  0.000174207817967 sec
Using who + ismember:   0.000151289605527 sec
Using who + isempty:    0.000137261391331 sec

I used a sample MAT file containing the following variables:

data = ones(10000);
data_info = 'hello';

Here's the test code:

function T = infotest

  T = zeros(6, 1);
  T(1) = timeit(@use_load_exist_1);
  T(2) = timeit(@use_load_exist_2);
  T(3) = timeit(@use_matfile);
  T(4) = timeit(@use_whos_ismember);
  T(5) = timeit(@use_who_ismember);
  T(6) = timeit(@use_who_isempty);

end

function isThere = use_load_exist_1
  load('infotest.mat');
  isThere = exist('data_info', 'var');
end

function isThere = use_load_exist_2
  load('infotest.mat', 'data_info');
  isThere = exist('data_info', 'var');
end

function isThere = use_matfile
  isThere = isprop(matfile('infotest.mat'), 'data_info');
end

function isThere = use_whos_ismember
  info = whos('-file', 'infotest.mat');
  isThere = ismember('data_info', {info.name});
end

function isThere = use_who_ismember
  variables = who('-file', 'infotest.mat');
  isThere = ismember('data_info', variables);
end

function isThere = use_who_isempty
  isThere = ~isempty(who('-file', 'infotest.mat', 'data_info'));
end
gnovice
  • 125,304
  • 15
  • 256
  • 359
  • No, this is not any faster than simply loading the complete .mat file. – Anton Rodenhauser Aug 01 '17 at 15:58
  • @AntonRodenhauser: How are you measuring that? `timeit` is more accurate than `tic/toc`. – gnovice Aug 01 '17 at 16:03
  • in all cases it takes roughly 7 seconds.. even if timeit is a little bit more accurate - I assume it doesnt make THAT much of a difference? – Anton Rodenhauser Aug 01 '17 at 16:04
  • 1
    @AntonRodenhauser: Using `timeit` I found a significant difference between using `whos/who` and loading the data (especially loading all of it). What version of MATLAB are you using? – gnovice Aug 01 '17 at 18:05
1

You can use the who command https://www.mathworks.com/help/matlab/ref/who.html

The syntax for this is to call who with the indicator of the file and then the variable you are looking for. You do not need to look for all the variables in the file

Dummy syntax is as follows

variable = who('-file','yourfilenamehere','data_info')

From there you can call

if ~isempty(variable)
%do something
end

This searches for only that variable within the file. In your versions of the who command you looked for all variables whereas this just looks for one.

Durkee
  • 778
  • 6
  • 14
1

So its a bit messy, but I just tried this and its pretty much instant regardless of size. Let me know if it works for you.

Please excuse the formatting, im not used to proper formatting here.

Note: This solution uses low level HDF5 libraries that are already built into matlab, so this method assumes your mat file is HDF5 (-v7.3). Otherwise it will not work.

You can be sure is a valid hdf5 file by doing this:

isValidHDF = H5F.is_hdf5('my_file.mat');

To see if your variable exists:

isThere = false; %Initialize as default value of false
fid = H5F.open('myfile.mat') % Use low level H5F builtin to open
try % Never use try/catch but this is a good for when its ok
     % Try to open the h5 group. Will error and catch to report back false if the variable isnt there, otherwise the variable exists
     gid = H5G.open(fid,['/data_info']); % Note: the "/" is required and OS independent, so its never "\" even in windows

     % I think this makes sure the variable isnt empty if the group opened successfully, but it hasnt been a problem yet
     hInfo = H5G.get_info(gid); 
     isThere = hInfo.nlinks > 0;
     H5G.close(gid);
end
H5F.close(fid);
dsaid
  • 11
  • 2
  • 1
    I understand the libraries a bit more now since I wrote this answer. Might be better to use H5L.open (link) opposed to H5G.open (group). It will identify the existence of a nested group or a nested dataset and seems to be considered better practice. The H5 library method also helps for nested/recursive structures, for example to check for existence nested variable 'a.b.c.d' stored in a .mat file – dsaid Dec 28 '21 at 22:49