Ignoring header lines in the middle of the text file using matlab

Question

I have a text file with multiple sections of observations. Each time, when the new observation starts, the file have some information for the data following (like header of a file).

When I used textscan, I could only able to read the first section only. For example, the data is arranged as follows:

1993-01-31 17:00:00.000 031       -61.00

1993-01-31 18:00:00.000 031       -55.00

1993-01-31 19:00:00.000 031       -65.00

 Format                                                   
 Source of Data                           
 Station Name               
 Data Interval Type     1-hour                                       
 Data Type              Final                                        

1993-02-01 00:00:00.000 032       -83.00

1993-02-01 01:00:00.000 032       -70.00

1993-02-01 02:00:00.000 032       -64.00

From above, I only want to read data lines starting with '1993' by ignoring the block of text in the middle.

you're gonna have to parse the file line-by-line, or apply some preprocessing to the file to remove those header lines (maybe use `grep` or `sed` UNIX tools to remove lines starting with [a-zA-Z]) — Amro, Mar 21 '16 at 18:56

Suever · Accepted Answer · 2016-03-21T19:32:25.620

2

As you noticed, textscan stops reading when it can't parse the input anymore. You can actually use this to your advantage. For example, in your case, you know that there are 5 lines of garbage between every "good" dataset. So we can run textscan once to get the first set, then run it successive times (with Headerlines set to 5 to ignore those 5 lines) to get each of the "good" datasets in the file. Then concatenate all of the data.

This works because when you use textscan with a file identifier, it does not rewind the file identifier back to the beginning of the file after it returns. It leaves it right where it stopped being able to parse it. Therefore, the next call to textscan starts right where you left off (minus any header lines you specify)

fid = fopen(filename, 'r');

% Don't ignore any lines but read until we stop
data = textscan(fid, formatspec);

% Repeat until we hit the end of the file
while ~feof(fid)
    % Skip 5 lines and read until we can't read anymore
    newdata = textscan(fid, formatspec, 'HeaderLines', 5);

    % Append to existing data
    data = cellfun(@(x, y)cat(1, x, y), data, newdata, 'uni', 0);
end

fclose(fid)

edited Mar 21 '16 at 19:32

answered Mar 21 '16 at 19:01

Suever

64,497
14
82
101

@Mushi It is the format specifier that you used to parse your file originally. I didn't know what you used in your case, so I just made it a variable. If you post the code you used to try to parse this in your question I can put that in this. It is the second input to `textscan` – Suever Mar 21 '16 at 19:07
Sorry for asking. I realized it after asking question. Now, the thing is that, the data I am getting is not concatenated. I have got data in chunks equal to the number of observations. Each chunk is in the form of cell. Any idea how to fix it. – Mushi Mar 21 '16 at 19:16
@Mushi Easily fixable. Can you post your format specifier so I can figure out what you need? – Suever Mar 21 '16 at 19:24
@Mushi I just updated the answer in a way that *should* work regardless of your format specifier – Suever Mar 21 '16 at 19:33
Perfect. Thanks a lot. – Mushi Mar 22 '16 at 09:26

Ignoring header lines in the middle of the text file using matlab

1 Answers1