0

I have a long text file like this:

I0927 11:33:18.534551 16932 solver.cpp:244]     Train net output #0: loss = 2.61145 (* 1 = 2.61145 loss)
I0927 11:33:18.534620 16932 sgd_solver.cpp:106] Iteration 20, lr = 0.001
I0927 11:33:33.221546 16932 solver.cpp:228] Iteration 40, loss = 0.573027
I0927 11:33:33.221771 16932 solver.cpp:244]     Train net output #0: loss = 0.573027 (* 1 = 0.573027 loss)
I0927 11:33:33.221851 16932 sgd_solver.cpp:106] Iteration 40, lr = 0.001
I0927 11:33:47.883162 16932 solver.cpp:228] Iteration 60, loss = 0.852016
I0927 11:33:47.884717 16932 solver.cpp:244]     Train net output #0: loss = 0.852016 (* 1 = 0.852016 loss)
I0927 11:33:47.884812 16932 sgd_solver.cpp:106] Iteration 60, lr = 0.001
I0927 11:34:02.543320 16932 solver.cpp:228] Iteration 80, loss = 0.385975
I0927 11:34:02.543442 16932 solver.cpp:244]     Train net output #0: loss = 0.385975 (* 1 = 0.385975 loss)
I0927 11:34:02.543514 16932 sgd_solver.cpp:106] Iteration 80, lr = 0.001
I0927 11:34:17.297544 16932 solver.cpp:228] Iteration 100, loss = 0.526758
I0927 11:34:17.297659 16932 solver.cpp:244]     Train net output #0: loss = 0.526758 (* 1 = 0.526758 loss)
I0927 11:34:17.297722 16932 sgd_solver.cpp:106] Iteration 100, lr = 0.001
I0927 11:34:31.962934 16932 solver.cpp:228] Iteration 120, loss = 0.792767

I want to extract the following information

[ Iteration, Train net output, lr ]

and put them in a cell in MATLAB.

can you please direct me how I can do that?

Shai
  • 111,146
  • 38
  • 238
  • 371
user6726469
  • 231
  • 1
  • 3
  • 14
  • The `Iteration` that you want in your output is the one from `sgd_solver` or `solver` . `regexp` should be able to handle this but you may need to run it multiple times. – Some Guy Sep 28 '16 at 13:13
  • You can read the file line by line. Then look for keyword position using `strfind` and chop the string accordingly. For example for "Iteration" look for the keyword "Iteration" starting index (i1), then look for the next comma (i2). Then you know that the value is located in [i1+9 : i2-1] –  Sep 28 '16 at 13:15
  • 2
    Here is an example of how I extract `Train net output` from your log. https://regex101.com/r/uGus7S/1 . You should be able to modify this expression for your other outputs easily and then use it in MATLAB. – Some Guy Sep 28 '16 at 13:24

2 Answers2

1

I am deleting the first two and last line of your log to make it consistent such that you have a Train net output and sgd_solver .. lr = line after every Iteration like this:

I0927 11:33:33.221546 16932 solver.cpp:228] Iteration 40, loss = 0.573027
I0927 11:33:33.221771 16932 solver.cpp:244]     Train net output #0: loss = 0.573027 (* 1 = 0.573027 loss)
I0927 11:33:33.221851 16932 sgd_solver.cpp:106] Iteration 40, lr = 0.001
I0927 11:33:47.883162 16932 solver.cpp:228] Iteration 60, loss = 0.852016
I0927 11:33:47.884717 16932 solver.cpp:244]     Train net output #0: loss = 0.852016 (* 1 = 0.852016 loss)
I0927 11:33:47.884812 16932 sgd_solver.cpp:106] Iteration 60, lr = 0.001
I0927 11:34:02.543320 16932 solver.cpp:228] Iteration 80, loss = 0.385975
I0927 11:34:02.543442 16932 solver.cpp:244]     Train net output #0: loss = 0.385975 (* 1 = 0.385975 loss)
I0927 11:34:02.543514 16932 sgd_solver.cpp:106] Iteration 80, lr = 0.001
I0927 11:34:17.297544 16932 solver.cpp:228] Iteration 100, loss = 0.526758
I0927 11:34:17.297659 16932 solver.cpp:244]     Train net output #0: loss = 0.526758 (* 1 = 0.526758 loss)
I0927 11:34:17.297722 16932 sgd_solver.cpp:106] Iteration 100, lr = 0.001

You can read this file as text using fileread and then execute regexp using the following code:

txt = fileread('log.txt');
it = regexp(txt,'I0927.*solver.cpp:228]\sIteration\s(.*),.*','tokens','dotexceptnewline')

it =

  1×4 cell array

    {1×1 cell}    {1×1 cell}    {1×1 cell}    {1×1 cell}

net_out = regexp(txt,'I0927.*solver.cpp:244]\s*Train\snet\soutput.*loss\s=\s(\S*).*','tokens','dotexceptnewline');
lr = regexp(txt,'I0927.*sgd_solver.cpp:106]\sIteration.*lr\s=\s(\S*)','tokens','dotexceptnewline');

The outputs will need a little bit of conditioning before you can convert them to numbers:

% Get outputs out of their cells
it = [it{:}]'; 
net_out = [net_out{:}]';
lr = [lr{:}]';

sim_out = str2double([it net_out lr]);
Some Guy
  • 1,787
  • 11
  • 15
0

As suggested by Some Guy, you can use regexp:

fid = fopen('log.txt','r');
output = {};
line = fgetl(fid);
while ischar(line)
    l1 = regexp(line, 'Iteration\s+(\d+),\s+loss\s+=\s+', 'tokens', 'once');
    if ~isempty(l1)
        %// we got the first line of an iteration
        line = fgetl(fid);
        l2 = regexp(line, 'Train net output #0: loss = (\S+)', 'tokens', 'once');
        line = fgetl(fid);
        l3 = regexp(line, 'Iteration \d+, lr = (\S+)', 'tokens', 'once');
        output{end+1} = [str2double(l1{1}), str2double(l2{1}), str2double(l3{1})];
    end
    line = fgetl(fid);
end;
fclose(fid);
output = vertcat(output{:});

BTW, are you aware of $CAFFE_ROOT/tools/extra/parse_log.py utility by caffe?

Community
  • 1
  • 1
Shai
  • 111,146
  • 38
  • 238
  • 371
  • 1
    Instead of doing `regexp` in the while loop you can read the entire thing at once using `fread` and use `regexp` over the result one at a time to get `[ Iteration, Train net output, lr ]` . Using `regexp` like you have done doesn't have a benefit over using `strfind` , in fact this is more difficult to understand. – Some Guy Sep 28 '16 at 17:51