0

I have a MATLAB script that reads a line from a text file. Each line of the text file contains the filename of a CSV. I need to keep track of what line MATLAB is working on so that I can save the data for that line in a cell array. How can I do that?

To illustrate, the first few lines of my .dat file looks like this:

2006-01-003-0010.mat
2006-01-027-0001.mat
2006-01-033-1002.mat
2006-01-051-0001.mat
2006-01-055-0011.mat
2006-01-069-0004.mat
2006-01-073-0023.mat
2006-01-073-1003.mat
2006-01-073-1005.mat
2006-01-073-1009.mat
2006-01-073-1010.mat
2006-01-073-2006.mat
2006-01-073-5002.mat
2006-01-073-5003.mat

I need to save the variable site_data from each of these .mat files into a different cell of O3_data. Therefore, I need to have a counter so that O3_data{1} is the data from the first line of the text file, O3_data{2} is the data from the second line, etc.

This code works, but it's done without using the counter so I only get the data for one of the files I'm reading in:

year = 2006:2014;
for y = 1:9
    flist = fopen(['MDA8_' num2str(year(y)) '_mat.dat']); % Open the list of file names - CSV files of states with data under consideration
    nt = 0; % Counter will go up one for each file loaded

    while ~feof(flist) % While end of file has not been reached
        fname = fgetl(flist);
        disp(fname); % Stores name as string in fname
        fid = fopen(fname);

        while ~feof(fid)
            currentLine = fgetl(fid);    
            load (fname, 'site_data'); % Load current file. It is all the data for one site for one year
            O3_data = site_data;
            % Do other stuff
        end
        fclose(fid);
    end
    fclose(flist);
end

If I add the time index part, MATLAB is telling me that Subscript indices must either be real positive integers or logicals. nt is an integer so I don't know what I'm doing wrong. I need the time index so that I can have O3_data{i} in which each i is one of the files I'm reading in.

year = 2006:2014;
for y = 1:9
    flist = fopen(['MDA8_O3_' num2str(year(y)) '_mat.dat']); % Open the list of file names - CSV files of states with data under consideration
    nt = 0; 

    while ~feof(flist) % While end of file has not been reached
        fname = fgetl(flist);
        fid = fopen(fname);

        while ~feof(fid)
            currentLine = fgetl(fid);
            nt = nt+1; % Time index
            load (fname, 'site_data'); % Load current file. It is all the data for one site for one year
            O3_data{nt} = site_data;
            % Do other stuff
        end  
        fclose(fid);
    end
    fclose(flist);
end
Amro
  • 123,847
  • 25
  • 243
  • 454
SugaKookie
  • 780
  • 2
  • 17
  • 41
  • try replacing the `nt` counter by using `O3_data{end+1} = site_data` where it was initially defined as `O3_data = {}` – Amro Jun 05 '14 at 19:58
  • by the way, in the second while loop `fname` doesn't change, so you are loading the same data over and over again.. – Amro Jun 05 '14 at 20:01
  • I don't believe I am. The .dat file contains the filenames of over 1000 different CSV files. I believe I am loading in a `site_data` from a different CSV file each time. – SugaKookie Jun 05 '14 at 20:03
  • I don't understand what the `end+1` part does. I have to start with the first filename in the `.dat` file, then move on to the next filename – SugaKookie Jun 05 '14 at 20:05
  • If I understood correctly, you are continously appending data to a cell array. The `end+1` is a slightly easier syntax for it: http://stackoverflow.com/a/2289119/97160 – Amro Jun 05 '14 at 20:11

2 Answers2

0

Try the following - note that since there is an outer for loop, the nt variable will have to be initialized outside of that loop so that we don't overwrite data from previous years (or previous j's). We can avoid the inner while loop since the just read file is a *.mat file and we are using the load command to load its single variable into the workspace.

year = 2006:2014;
nt   = 0;

data_03 = {};   % EDIT added this line to initialize to empty cell array
                % note also the renaming from 03_data to data_03

for y = 1:9
    % Open the list of file names - CSV files of states with data under 
    % consideration
    flist = fopen(['MDA8_O3_' num2str(year(y)) '_mat.dat']); 

    % make sure that the file identifier is valid
    if flist>0

        % While end of file has not been reached
        while ~feof(flist) 
            % get the name of the *.mat file
            fname = fgetl(flist);

            % load the data into a temp structure
            data = load(fname,'site_data');

            % save the data to the cell array
            nt = nt + 1;
            data_03{nt} = data.site_data;
        end

        fclose(flist);  % EDIT moved this in to the if statement
    end

end

Note that the above assumes that each *.dat file contains a list of *.mat files as illustrated in your above example.

Note the EDITs in the above code from the previous posting.

Geoff
  • 1,603
  • 11
  • 8
  • At the `03_data{nt} = data.site_data;` line, I get the error `The input character is not valid in MATLAB statements or expressions.` Do you know why that is? I can't figure it out. – SugaKookie Jun 06 '14 at 02:05
  • Yikes - I should have tested this out! MATLAB variables cannot start with a number as in `03_data`. This should be changed to `data_03` or something similar. I will edit the above code to reflect that. – Geoff Jun 06 '14 at 02:18
  • Oh actually, it's not starting with a zero. That's the letter O. Thanks for fixing it though. It's good to know that's why it had that error. – SugaKookie Jun 06 '14 at 04:07
0

Try the following:

years = 2006:2014;
for y=1:numel(years)
    % read list of filenames for this year (as a cell array of strings)
    fid = fopen(sprintf('MDA8_O3_%d_mat.dat',years(y)), 'rt');
    fnames = textscan(fid, '%s');
    fnames = fnames{1};
    fclose(fid);

    % load data from each MAT-file
    O3_data = cell(numel(fnames),1);
    for i=1:numel(fnames)
        S = load(fnames{i}, 'site_data');
        O3_data{i} = S.site_data;
    end

    % do something with O3_data cell array ...
end
Amro
  • 123,847
  • 25
  • 243
  • 454
  • I'm getting this error: `Error using textscan Invalid file identifier. Use fopen to generate a valid file identifier.` `fid` is apparently `-1` after doing `fopen`. – SugaKookie Jun 05 '14 at 21:55
  • @shizishan: sorry I had a typo in the second line. Fixed now. EDIT: ok there were two typos :) – Amro Jun 05 '14 at 22:00
  • What is the purpose of the line `fnames = fnames{1};`? – SugaKookie Jun 06 '14 at 00:09
  • @shizishan: that because of the way [textscan](http://www.mathworks.com/help/matlab/ref/textscan.html#outputarg_C) works; it returns a cell array of length K equal to the number of formatting specifiers (in this case K=1 for a cell array of length *one* matching the `%s` specifier). So that line simply unpacks the value from the return cellarray. Since we specified a string conversion specifier, that value is itself a cellarray of strings. – Amro Jun 06 '14 at 00:28