1

I'm reviving an old script in Matlab which uses "[d h v c t] = textread(fn,'%s %*s %s %f %s %s');" to import data, I want to replace the textread with textscan as that seems to be recommended.

My problem (with both the old and the new) is that my fourth column of data - the floating point value- has some gaps in it. As whitespace is my delimiter this means that matlab tries to insert the fifth column which contains letters as a floating point value and therefore gives me an error.

Any suggestions on how to make it automatically skip lines without a value? I have about 100 files which need to be periodically updated and therefore an manual methods are too time consuming. My data looks like this but over a long period of time:

31/12/1991 @ 00:00:00 Q25 T2
01/01/1992 @ 00:00:00 Q25 T2
02/01/1992 @ 00:00:00 24.451330 Q25 T2
03/01/1992 @ 00:00:00 24.674587 Q25 T2
04/01/1992 @ 00:00:00 25.264880 Q25 T2

Thanks

Sarah
  • 3,022
  • 1
  • 19
  • 40
  • 1
    Do you need all of the values, in particular those after the number like Q25 and T2? Are there any content values in the data, e.g., is Q25 always Q25 or at least always start with Q? One needs to know a bit about the data in order to tweak `textscan` for such cases. – horchler Jan 23 '14 at 00:09
  • Hi, I do need all the columns, and the numbers change but Q and T are always the first letter/number of these columns – Sarah Jan 23 '14 at 00:22

1 Answers1

1

Okay, this is a bit of a hack, but it works. textscan can be so much faster than other methods that it is often worth it to play around a bit if your data has particular constraints.

fid = fopen('test.txt');
t = textscan(fid,'%s%*s%s%f%s%s','TreatAsEmpty','Q');
fclose(fid);
t{:}

You'll see that t{3} is a 5-by-1 array with the default NaN for the empty values. However, you still need to do one more thing as t{4} is missing the leading 'Q' for the first two elements. There are probably several ways to accomplish this, but here's an easy one-liner that uses isnan to index into the rows where the 'Q' needs to be added:

t{4}(isnan(t{3})) = cellfun(@(c)['Q' c],t{4}(isnan(t{3})),'UniformOutput',false);


How does using the 'TreatAsEmpty' parameter work?

In the case of the fourth column (the third non-skipped column) we're dealing with a numeric field. This option only applies to when detecting numeric fields ('%f'). The string 'Q25' is broken in to the number NaN and the string '25', effectively adding a column. The 'Q25' elements in the fifth column don't matter because they're scanned as strings. So it should be fine if the letter 'Q' appears elsewhere in the data.

horchler
  • 18,384
  • 4
  • 37
  • 73