How to separate float with string that are togetter written - matlab

Question

I have a question regarding to read a txt file in matlab were the format is not know , but each row in the txt file always start like this:

2012-11-01 00:00:00.00 XX YY  00.000s

Then some different stuff is logged and the txt file can look different, for example

Ex1:    2012-11-01 00:00:00.00 XX YY  00.000s  000.00deg  0.00rpm  0.00rpm
Ex2:    2012-11-01 00:00:00.00 XX YY  00.000s  000.00deg  0.00rpm   
Ex3:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00deg 0.00rpm 0.00rpm 0.0deg      
Ex4:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00rpm

I handle this with textscan and use:

Fid = fopen('text.txt');
initfrm = {'%s%s%s%s %.3f %s'};
frm = repmat('%.2f %s',1,NCol);
frm = strcat(initfrm, frm);
Tmp = textscan(fid,frm{1});
Fclose(fid);

In the file its calculated how many col (NCol) we have logged but is not showed here

But sometimes the text file includes 0.0%, for example:

Ex1:    2012-11-01 00:00:00.00 XX YY  00.000s 000.00deg   0.00rpm  0.00rpm  0.0%

Now '%.2f' won’t work. I don’t know when the log is like this. Is there a better way to separate the float and string when they are printed together; I just want collect the data (float) so I can plot.

How can I get all float values when it varies with %.2f and %.1f; you don't know the pattern.

The situation is explained quite well, so what exactly is the question that you have now? — Dennis Jaheruddin, Nov 09 '12 at 09:25

Rody Oldenhuis · Accepted Answer · 2012-11-09T10:31:24.560

2

Importing text like this can be a real pain; usually, this is a good test of your knowledge of string manipulation :)

I believe the following commands will do nicely:

% Read in entire file as string
fid = fopen('yourFile.txt');
    C = textscan(fid, '%s', 'delimiter', '');
fclose(fid);
C = C{1};

% Remove first part (from column 39 onwards in your example; 
% adjust to match your actual data)
C = cellfun(@(x)x(39:end), C, 'UniformOutput',false);

% Remove unwanted junk
% NOTE: this removes all occurrences of 'rpm', 'deg', 
% 's', and the trailing '0.0%'
C = regexprep(C, {'deg' 'rpm' 's' '([0-9]+\.[0-9]+%)$'}, '');

% Tokenize string and convert to double
C = cellfun(@(x)textscan(x, '%f'), C);

I tested this with yourFile.txt:

Ex1:    2012-11-01 00:00:00.00 XX YY  00.000s  000.00deg  0.00rpm  0.00rpm
Ex2:    2012-11-01 00:00:00.00 XX YY  00.000s  000.00deg  0.00rpm   
Ex3:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00deg    0.00rpm  0.00rpm 0.0deg      
Ex3:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00deg    0.00rpm  0.00rpm 0.0deg    0.0%
Ex4:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00rpm
Ex4:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00rpm

The final contents of C with the commands above is

edited Nov 09 '12 at 10:31

answered Nov 09 '12 at 10:25

Rody Oldenhuis

37,726
7
50
96

definitely nice, effective and compact! – Acorbe Nov 09 '12 at 10:27
@Acorbe: Gotta admit, `regexp` is just awesome :) – Rody Oldenhuis Nov 09 '12 at 10:28
I do agree. I really must get familiar with it. – Acorbe Nov 09 '12 at 10:32
Really smooth code! I dont understand how it takes care of the percent values, i.e. `0.0%`. Because if you have an example with more then one it cant handle it (stop after the first one). I give you an example: `Ex: 2012-11-01 00:00:00.00 XX YY 00.000s 0.00deg 0.0% 0.00rpm 0.00rpm 0.0deg 0.0% 0.0%` `Ans: 0 0 0 Correct Ans is ofcourse: 0 0 0 0 0 0 0 0` – user1564762 Nov 09 '12 at 12:19
@user1564762: Ah, there might be multiple of them...well, in that case, I *think* removing the `'$'` in the `regexprep` will fix that, but you'll have to test that...You *do* want to fully remove the values before the percent sign no? Or should they be included in the result? – Rody Oldenhuis Nov 09 '12 at 12:27
The result should also include the values before the percent sign, so I dont want to remove that value (only the percent sign). Ah, so I only have to change '([0-9]+\.[0-9]+%)$' to '%' – user1564762 Nov 09 '12 at 12:35
@user1564762: Yup, that should do it then :) – Rody Oldenhuis Nov 09 '12 at 12:40

Acorbe · Answer 2 · 2012-11-09T10:14:07.663

I am not sure I have interpreted your question correctly. It seems to me that you have a variable number of tokens,either N or N+1 (N+m, perhaps?), in each line of text.

If so, I would suggest an approach based on extracting tokens from each line.

Consider this:

you use fgets to extract each line from your file;
you use strtok to iteratively separate tokens (i.e., tokenize your string. You use ' ' as token delimiter);
because you have an initial pattern which is fixed, you may want to re-merge the first N tokens and parse them as you already do. Then, you may want to check if the token in position N+1 is present and finally parse it.

How to separate float with string that are togetter written - matlab

2 Answers2