read complicated format .txt file into Matlab

Question

I have a txt file that I want to read into Matlab. Data format is like below:

term2 2015-07-31-15_58_25_612 [0.9934343, 0.3423043, 0.2343433, 0.2342323]
term0 2015-07-31-15_58_25_620 [12]
term3 2015-07-31-15_58_25_625 [2.3333, 3.4444, 4.5555]
...

How can I read these data in the following way?

name = [term2 term0 term3] or namenum = [2 0 3]
time = [2015-07-31-15_58_25_612 2015-07-31-15_58_25_620 2015-07-31-15_58_25_625]
data = {[0.9934343, 0.3423043, 0.2343433, 0.2342323], [12], [2.3333, 3.4444, 4.5555]}

I tried to use textscan in this way 'term%d %s [%f, %f...]', but for the last data part I cannot specify the length because they are different. Then how can I read it? My Matlab version is R2012b.

Thanks a lot in advance if anyone could help!

score 1 · Accepted Answer · answered Aug 03 '15 at 16:17

There may be a way to do that in one single pass, but for me these kind of problems are easier to sort with a 2 pass approach.

Pass 1: Read all the columns with a constant format according to their type (string, integer, etc ...) and read the non constant part in a separate column which will be processed in second pass.
Pass 2: Process your irregular column according to its specificities.

In a case with your sample data, it looks like this:

%% // read file 
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %s %*c %[^]] %*[^\n]'  ) ;
fclose(fid) ;

%% // dispatch data into variables
name = M{1,1} ;
time = M{1,2} ;
data = cellfun( @(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,3} ) ;

What happened:
The first textscan instruction reads the full file. In the format specifier:

term%d read the integer after the literal expression 'term'.
%s read a string representing the date.
%*c ignore one character (to ignore the character '[').
%[^]] read everything (as a string) until it finds the character ']'.
%*[^\n] ignore everything until the next newline ('\n') character. (to not capture the last ']'.

After that, the first 2 columns are easily dispatched into their own variable. The 3rd column of the result cell array M contains strings of different lengths containing different number of floating point number. We use cellfun in combination with another textscan to read the numbers in each cell and return a cell array containing double:

textscan advanced

Bonus: If you want your time to be a numeric value as well (instead of a string), use the following extension of the code:

%% // read file 
fid = fopen('Test.txt','r') ;
M = textscan( fid , 'term%d %f-%f-%f-%f_%f_%f_%f %*c %[^]] %*[^\n]'  ) ;
fclose(fid) ;

%% // dispatch data
name = M{1,1} ;
time_vec = cell2mat( M(1,2:7) ) ;
time_ms  = M{1,8} ./ (24*3600*1000) ;   %// take care of the millisecond separatly as they are not handled by "datenum"
time = datenum( time_vec ) + time_ms ;
data = cellfun( @(s) textscan(s,'%f',Inf,'Delimiter',',') , M{1,end} ) ;

This will give you an array time with a Matlab time serial number (often easier to use than strings). To show you the serial number still represent the right time:

>> datestr(time,'yyyy-mm-dd HH:MM:SS.FFF')
ans =
2015-07-31 15:58:25.612
2015-07-31 15:58:25.620
2015-07-31 15:58:25.625

So useful for me! Thanks a lot for your very detailed answer! — Hongwei, Aug 04 '15 at 06:41

score 0 · Answer 2 · answered Feb 08 '16 at 20:38

For comlicated string parsing situations like such it is best to use regexp. In this case assuming you have the data in file data.txt the following code should do what you are looking for:

txt = fileread('data.txt')
tokens = regexp(txt,'term(\d+)\s(\S*)\s\[(.*)\]','tokens','dotexceptnewline')

% Convert namenum to numeric type
namenum = cellfun(@(x)str2double(x{1}),tokens)

% Get time stamps from the second row of all the tokens
time = cellfun(@(x)x{2},tokens,'UniformOutput',false);

% Split the numbers in the third column 
data = cellfun(@(x)str2double(strsplit(x{3},',')),tokens,'UniformOutput',false)

read complicated format .txt file into Matlab

2 Answers2