0

I am trying to read the following data into MATLAB:

'0.000000 1  18EFFA59x  Rx D 8  AD  09  02  00  00  00  00  30'  
'0.004245 1  14EFF01Cx  Rx D 6  DB  00  FF  FF  00  71'  
'0.004640 1  CEF801Cx   Rx D 3  3F  00  3B'  
'0.005130 1  14EF131Cx  Rx D 6  DB  00  FF  FF  00  71'  
'0.005630 1  CEF801Cx   Rx D 3  3F  00  C3'  
'0.010015 1  18EFFA59x  Rx D 8  AD  07  01  00  00  00  00  30'  
'0.014145 1  CF004F0x   Rx D 8  F0  FF  7D  00  00  FF  FF  FF'  
'0.015060 1  18EFFA59x  Rx D 8  AD  07  02  00  00  00  00  30'  
'0.018235 1  18EF1CF0x  Rx D 8  F2  1E  05  FF  FF  00  71  FF'  
'0.018845 1  18EA5941x  Rx D 3  09  FF  00'  

I can easily read in each line as a string - but to make post-processing more efficient I'd like to separate each line by its delimiter - which is whitespace. In other words, the end result should be a non-singleton cell array. I can't seem to find a very efficient way of doing this. Efficiency is important because these files are several million lines long and processing in MATLAB with strings/cells takes a long time.

Any help would be appreciated. Thanks.

Pursuit
  • 12,285
  • 1
  • 25
  • 41
  • what have you already tried? Is `f1=fopen(file.txt); textscan(f1,'%s','delimiter',' ');` not efficient enough? What should your resulting cell array look like? – RTbecard Jul 22 '15 at 22:09
  • or use the import data tool and have it export a script to import. You can make it import them into individual vectors or an array using that utility. it then generates a script to function that you can modify – bern Jul 23 '15 at 00:50
  • If you can read each line as string, then just use [strsplit](http://ch.mathworks.com/help/matlab/ref/strsplit.html) to split it by space. – Marcin Jul 23 '15 at 06:30

1 Answers1

0

You appear to have fixed-width fields, so I would treat it as such and let textscan do the most of the pre-processing for you by turning off delimiters and whitespace and defining the field widths and types explicitly:

test = {...
    '0.000000 1  18EFFA59x  Rx D 8  AD  09  02  00  00  00  00  30'
    '0.004245 1  14EFF01Cx  Rx D 6  DB  00  FF  FF  00  71'
    '0.004640 1  CEF801Cx   Rx D 3  3F  00  3B'
    '0.005130 1  14EF131Cx  Rx D 6  DB  00  FF  FF  00  71'
    '0.005630 1  CEF801Cx   Rx D 3  3F  00  C3'
    '0.010015 1  18EFFA59x  Rx D 8  AD  07  01  00  00  00  00  30'
    '0.014145 1  CF004F0x   Rx D 8  F0  FF  7D  00  00  FF  FF  FF'
    '0.015060 1  18EFFA59x  Rx D 8  AD  07  02  00  00  00  00  30'
    '0.018235 1  18EF1CF0x  Rx D 8  F2  1E  05  FF  FF  00  71  FF'
    '0.018845 1  18EA5941x  Rx D 3  09  FF  00'};

test = strjoin(test', '\n');

C = textscan(test, '%8.6f %2u %11s %4s %2s %2u %33s', 'delimiter', '','whitespace','');

col1 = C{1};
col2 = C{2};
col3 = strtrim(C{3});
col3 = cellfun(@(x)hex2dec(x(1:end-1)), col3); % for instance.
col4 = strtrim(C{4});
col5 = strtrim(C{5});
col6 = C{6};
col7 = strtrim(C{7});

In the real world, you'd substitute the text string for a file id. For the last variable-length field, just read the whole thing in, making sure you specify the maximum possible length. MATLAB will read a field until it gets to the end or reaches a newline character (in fact, I made the last field width 1 larger, just to make sure). Each field is then aggregated into a cell. I also took the liberty of converting the third field from hex to decimal to show how you might post-process the numbers further.

As a further note, if you really do have gigantic files and need maximum speed, you could skip the strtrim step on the character fields by specifying %*ns where n is the desired field width, for any known gaps such as the 2 character gap between columns 3 and 4. The star says to ignore that field. I find this way of doing things a bit more readable and intuitive, however, and leaves a small margin of error in case one of the fields, such as the 4th, occasionally has a 3 character entry.

craigim
  • 3,884
  • 1
  • 22
  • 42