Matlab to read in fix-width text file

Question

I have a text file like below:

TestData                                                                     

  6.84 11.31 17.51 22.62 26.91 31.98 36.47 35.85 28.47 20.57 10.50  6.37  test1
  0.24  2.62  4.94  7.17 10.39 15.37 18.73 18.29 12.26  6.46  1.15 -0.33  test2
 68.47 95.04156.07218.39304.31320.22311.69269.22203.01135.60 68.18 55.09  test3

 68.47 95.04156.07218.39304.31320.22311.69269.22203.01135.60 68.18 55.09  test4
...

As you can see, the first two lines are comments to ignore. In the following lines, there is a comment at the end of each line too. Each number is in the form of %6f. Also, there are blank lines in between.

I want to read in all the numbers into a matrix to make plots. I tried to use textscan, but had problems to ignore the last column, the blank lines and read in numbers that are connected (e.g., some numbers in the line: test4).

Here is the code I have by now:

data=dir('*.txt');
formatspecific='%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f%6f';
for i=1:length(data);
    TestData1=data(i).name;
    tempData=textscan(TestData1,formatspecific,'HeaderLines',2);
end

Anybody can help to make a sample code to improve the textscan part?

Why'd you not specify the last string in your format? That's all you need to do, then just kill off the string in your result. — Adriaan, Oct 19 '15 at 17:03
Even I put strings in the format, I just got 13 blank [] in the output. Do you know why? — James, Oct 19 '15 at 17:07
importdata does not work, and I got "Use TEXTSCAN or FREAD for more complex formats". — James, Oct 19 '15 at 17:14

il_raffa · Accepted Answer · 2015-10-20T17:44:56.270

To use textscan to read a file, you have to "open" it before calling textscan and "close" it after; you should use

fopen to open the input file
fclose to close the input file

textscan returns a cellarray with the content read from the input file; since you are reading more than one file, you should change the way you manage the cellarray returned by textscan, actually, as it is now in your code, the data are overwritten at each iteration.

One possibility could be to store the data in an array of struct with, for example, 2 fields: the name of the input file and the data.

Another possibility could be to generate a struct whos each fields contains the data read from the input file; you can automatically generate the name of the fileds.

Another one possibility could be to store them into a a matrix.

Hereafter, you can find a script in which these three alternative have been implemented.

Code Updated (following the comment received)

In order to be able to correctly read data such as 95.04156.07 as 95.04 156.07, the format specifier should be modified from %6f to %6.2f

% Get the list of input data
data=dir('input_file*.txt');
% Define the number of data column
n_data_col=12;
% Define the number of heared lines
n_header=2;
% Build the format specifier string
% OLD format specifier
formatspecific=[repmat('%6f',1,n_data_col) '%s']
% NEW format specifier
formatspecific=[repmat('%6.2f',1,n_data_col) '%s']
% Initialize the m_data matrix (if you know in advance the numer of row of
% each input file yoiu can define since the beginning the size of the
% matrix)
m_data=[];
% Loop for input file reading
for i=1:length(data)
   % Get the i-th file name
   file_name=data(i).name
   % Open the i-th input file
   fp=fopen(file_name,'rt')
   % Read the i-th input file
   C=textscan(fp,formatspecific,'headerlines',n_header)
   % Close the input file
   fclose(fp)
   % Assign the read data to the "the_data" array struct
   the_data(i).f_name=file_name
   the_data(i).data=[C{1:end-1}]
   % Assign the data to a struct whos fileds are named after the inout file
   data_struct.(file_name(1:end-4))=[C{1:end-1}]
   % Assign the data to the matric "m_data
   m_data=[m_data;[C{1:end-1}]]
end

Input file

TestData                                                                     

  6.84 11.31 17.51 22.62 26.91 31.98 36.47 35.85 28.47 20.57 10.50  6.37  test1
  0.24  2.62  4.94  7.17 10.39 15.37 18.73 18.29 12.26  6.46  1.15 -0.33  test2
 68.47 95.04156.07218.39304.31320.22311.69269.22203.01135.60 68.18 55.09  test3

 68.47 95.04156.07218.39304.31320.22311.69269.22203.01135.60 68.18 55.09  test4

Output

m_data =

  Columns 1 through 7

    6.8400   11.3100   17.5100   22.6200   26.9100   31.9800   36.4700
    0.2400    2.6200    4.9400    7.1700   10.3900   15.3700   18.7300
   68.4700   95.0400  156.0700  218.3900  304.3100  320.2200  311.6900
   68.4700   95.0400  156.0700  218.3900  304.3100  320.2200  311.6900

  Columns 8 through 12

   35.8500   28.4700   20.5700   10.5000    6.3700
   18.2900   12.2600    6.4600    1.1500   -0.3300
  269.2200  203.0100  135.6000   68.1800   55.0900
  269.2200  203.0100  135.6000   68.1800   55.0900

Hope this helps.

Thanks. This is very clear and is very helpful. But textscan ignores the leading spaces, and the matrix C is not correct for the line test4. For example, the 3rd number in test4 is 56.072, but it should be 156.07. — James, Oct 20 '15 at 05:05
Sorry for this late answer. **I've updated the code**, now `95.04156.07` is correctly read as `95.04` and `156.07`. From your comment it is not clear if you want to "recognize" the empty lines (e. g. the one between `test3` and `test4`. If so, what do you want to inset in the matrix? — il_raffa, Oct 20 '15 at 17:48
Thanks. The blank line could be either left 0, just be deleted. The program works fine. — James, Oct 21 '15 at 20:12
You're welcome, sorta for the initial bug. Perhaps you might want to accept the answer to close the question. — il_raffa, Oct 21 '15 at 20:45

Matlab to read in fix-width text file

1 Answers1