0

I have written a function compareTGZ is used to compare two tgz folders. The tgz folders contains the following types of files: - mat files and textual files such as .m, .ddf and .txt.

The function is defined as follows:

function [testStatus, testMessage] = compareTGZ(refTGZFile, newTGZFile)

I want to add a condition to check the files present in refTGZFile but not in newTGZFile and vice versa.

if lenOffnames_old > lenOffnames_new || lenOffnames_old < lenOffnames_new
    for i=1:lenOffnames_old
    % Split the path of fnames_old with delimiter filesep
        refTGZParts = strsplit(fnames_old{i}, filesep);
        % Split the path of fnames_new with delimiter filesep
        newTGZParts = strsplit(fnames_new{i}, filesep);

        if(strcmp(refTGZParts{3},newTGZParts{3}))==0; 
            testStatus = 0;
            % Return files in Reference tgz which are not found in Test tgz
            fprintf('File %s in Reference tgz is not found in Test tgz\n',refTGZParts{3});
            % Return files in Test tgz which are not found in Reference tgz
            fprintf('File %s in Test tgz is not found in Reference tgz\n',newTGZParts{3});
        end
    end

end

When refTGZFile contains more files than newTGZFile, I get the correct results. But newTGZFile contain more files than refTGZFile, I get an error.

Please can some one advice me on how to solve this bug.

dur
  • 15,689
  • 25
  • 79
  • 125
Maliva
  • 3
  • 4
  • Your logic assumes that the filename on the right is always going to be in the same position on the list as the filename on the left. As you see in your results this is a poor assumption. You need to compare the filenames without any assumption of order. Consider an approach using something like [`intersect`](http://www.mathworks.com/help/matlab/ref/intersect.html) or [`ismember`](http://www.mathworks.com/help/matlab/ref/ismember.html) – sco1 Apr 17 '16 at 19:23

1 Answers1

0

You should be able to easily determine the files that exist in one but not the other using setdiff.

%// Create a temporary directory to untar everything to
tmpdir = tempname;

%// Extract the reference .tgz to this location
fnames_old = untar(refTGZFile, tmpdir);

%// Delete the temporary directory
rmdir(tmpdir, 's')

%// Extract the other .tgz file to the same location
fnames_new = untar(newTGZFile, tmpdir);

%// Use setdiff to compare the files that were in one but not the other
in_old_but_not_new = setdiff(fnames_old, fnames_new);
in_new_but_not_old = setdiff(fnames_new, fnames_old);

%// Clean up the temporary folder
rmdir(tmpdir, 's')

If you don't want to extract everything to a new location like this and you have a list of two absolute paths, you can convert them to relative paths and then compare them.

%// Anonymous function to create a relative path
relpath = @(base,pth)regexprep(pth, ['^', regexptranslate('escape', base)], '');

fnames_old = untar(refTGZFile, oldTGZ);
fnames_new = untar(newTGZFile, newTGZ);

%// Convert everything to relative paths
fnames_old_relative = relpath(oldTGZ, fnames_old);
fnames_new_relative = relpath(newTGZ, fnames_new);

%// Compare what is in one but not the other.
is_old_but_not_new = setdiff(fnames_old_relative, fnames_new_relative);
is_new_but_not_old = setdiff(fnames_new_relative, fnames_old_relative);

And then to print out the results

for k = 1:numel(is_old_but_not_new)
    fprintf('File %s in Reference tgz is not found in Test tgz\n', is_old_but_not_new{k});
end

for k = 1:numel(is_new_but_not_old)
    fprintf('File %s in Test tgz is not found in Reference tgz\n', is_new_but_not_old{k});
end
Suever
  • 64,497
  • 14
  • 82
  • 101
  • @Maliva See the update. – Suever Apr 17 '16 at 19:31
  • What is the tmpname? Because my inputs are refTGZFile and newTGZFile – Maliva Apr 17 '16 at 19:41
  • @Maliva It was a typo. It should be `tempname`. You actually want the second part of the answer though. – Suever Apr 17 '16 at 19:42
  • Yes, my problem is with the second part – Maliva Apr 17 '16 at 19:50
  • @Maliva There is no `tmpname` variable in the second part – Suever Apr 17 '16 at 19:51
  • I have tried the second part with fprintf('File %s in Reference tgz is not found in Test tgz\n',is_old_but_not_new ); – Maliva Apr 17 '16 at 20:49
  • @Maliva I have updated the answer with how to print the file information – Suever Apr 17 '16 at 21:59
  • Thanks, but it does not print all the files in newTGZFile that are not in refTGZFile – Maliva Apr 17 '16 at 22:16
  • @Maliva I assumed that I had provided enough information for you to figure out how to apply it to the `is_new_but_not_old` variable as well. I have updated the answer to print *both* now. – Suever Apr 17 '16 at 22:18
  • It works well now. Here is the result – Maliva Apr 17 '16 at 22:24
  • File \MATES\ModelFunctionsList.txt in Reference tgz is not found in Test tgz File \MATES\getMatPath.m in Reference tgz is not found in Test tgz File \MATES\mcr_build in Reference tgz is not found in Test tgz File \MATES\ModelFunction.txt in Test tgz is not found in Reference tgz File \MATES\test1and2.png in Test tgz is not found in Reference tgz File \MATES\z65.png in Test tgz is not found in Reference tgz File \MATES\z81.png in Test tgz is not found in Reference tgz File \MATES\zfg.png in Test tgz is not found in Reference tgz – Maliva Apr 17 '16 at 22:25
  • @Maliva Ok if that worked for you then please mark this answer as correct. Thanks. – Suever Apr 17 '16 at 22:25
  • Thanks very much. Now I just have to get just the names of the file with the \MATES\ – Maliva Apr 17 '16 at 22:26
  • @Maliva That is a separate question. – Suever Apr 17 '16 at 22:33
  • Yes, I want to extract the names of the files only from \MATES\Filename.ext. I am thinking of using fileparts – Maliva Apr 17 '16 at 22:37
  • @Maliva As I said that it another question that is outside of the scope of this question. Please go ask a new question and mark this solution as correct for your initial question. – Suever Apr 17 '16 at 22:37