Matlab textscan introducing additional rows with zeros or NaNs

Question

I'm trying to read a .dat file containing tens of thousands of rows, where each of them looks something like:

   1.9681968    0   0   19.996  0   61  100 1.94E-07    6.62E-07  
   2.330233     0   0   19.996  0   61  100 1.94E-07    6.62E-07
   2.6512651    0   0   19.997  0   61  100 1.94E-07    6.62E-07
   3.5923592    0   0   19.998  0   61  100 1.96E-07    6.62E-07

Now for example, I'm trying to read it with

    Data = textscan(fid, %.9f%*f%*f%.9f%*f%*f%*f%.9f)

where the string format depends on which column I want to read.

When reading big files, the first column of the cell array 'Data' will become

    1.96819680000000
    0
    2.33023300000000
    2.65126510000000
    0
    3.59235920000000
    0

and the rest of the columns will show NaNs instead of the zeros. The additional rows are almost as many as the rows in the data file, thus I get arrays that are almost a factor 2 larger.

I guess this has something to do with errors when reading doubles, since this problem doesn't occur if I try to read the file as strings.

But if possible, I would like to not read everything as strings and the have to convert everything to doubles.

Any ideas?

The first line of your file does not have the same number of columns as the others. — horchler, Aug 07 '15 at 20:18
You have only 8 `%f` in your format specifier but you have 9 columns in your file. This is why you have inconsistent output with "extra rows". The program read 8 values in one row, then try to start another output row ... read only one value left on the line, fail to read more so close the line, start another line, read 8 values ... etc ... — Hoki, Aug 10 '15 at 14:46
Actually, I used `%*[^\n]` at the end of my string format, which should take care of the rest. But you might be right in your reasoning, I will check if that was the case. Thanks! — woodenflute, Aug 10 '15 at 16:13

nalyd88 · Accepted Answer · 2015-08-07T22:55:43.850

I think the issue is with the format string. Try the format string shown below.

fid = fopen('test.txt'); 
% data = textscan(fid, '%.9f%*f%*f%.9f%*f%*f%*f%.9f')
data = textscan(fid, '%f %f %f %f %f %f %f %f %f');
data = cell2mat(data)
fclose(fid);

Where test.txt is a text file containing your given example data. The above code gives the following output.

1.9682         0         0   19.9960         0   61.0000  100.0000    0.0000       NaN
2.3302         0         0   19.9960         0   61.0000  100.0000    0.0000    0.0000
2.6513         0         0   19.9970         0   61.0000  100.0000    0.0000    0.0000
3.5924         0         0   19.9980         0   61.0000  100.0000    0.0000    0.0000

Notice the NaN value when the text only contained eight values. If you want to specify a default value for when lines contain less values use the EmptyValue setting:

data = textscan(fid, '%f %f %f %f %f %f %f %f %f','EmptyValue', 42);

Then you will get:

1.9682         0         0   19.9960         0   61.0000  100.0000    0.0000   42.0000
2.3302         0         0   19.9960         0   61.0000  100.0000    0.0000    0.0000
2.6513         0         0   19.9970         0   61.0000  100.0000    0.0000    0.0000
3.5924         0         0   19.9980         0   61.0000  100.0000    0.0000    0.0000

You can then get the first column by indexing the resulting matrix like this data(:,1) which outputs the following:

Hi, thank you for your answer! The problem is not with the string format though, I made some mistakes when trying to make an example. The problem is that new empty rows are introduced, placed between the real rows. — woodenflute, Aug 10 '15 at 14:33
That can happen if your format string is too short (i.e., it has eight `%f` in it but the data has nine values). That's why I recommended using nine `%f` and then specifying a default value when the text only contains eight values. — nalyd88, Aug 10 '15 at 16:54

Matlab textscan introducing additional rows with zeros or NaNs

1 Answers1