1

I am kind of frustrated with fscanf and its time-performance in reading a file with structured data. I want to read a .txt file, which has three entries per line: DOUBLE DOUBLE LONG-DOUBLE, and I only want to read the first N entries. Unfortunatly, fscanf is very slow. Do you know any faster method?

Btw, I am aware of several topics on this topic here, for instance this question. However, the answer does not help in my case, as i'm already using fscanf.

My code is:

    formatSpec='%d %d %ld'; % important: last one is long-double to support 64bit values!
    sizeA=[3 100000];
    file1=fopen('file1.txt','r');
        [content,cc]=fscanf(file1,formatSpec,sizeA);        
    fclose(file1);

Do you know any more clever idea to read N lines of a file with the given structure? Thanks!

Edit: The filecontent of file1.txt looks like this:

1  1 204378259709308
0  1 204378259782523
1  1 204378260105693
3  1 204378260381676
3  1 204378260854931
1  1 204378261349990
1  1 204378262189528
0  1 204378263067715
1  1 204378263301204
1  1 204378263676471
1  1 204378263771064
1  1 204378264565420
0  1 204378264608240
0  1 204378264973698
...
3  1 205260543966542

So basicly: A[space][space]B[space]C with A and B are [0,9] and C is a 64bit integer

Community
  • 1
  • 1
Mario Krenn
  • 223
  • 2
  • 13
  • `%d` is decimal, not double. Reading the first N characters is slow because the parser has to parse the full line to find the EOL. Using matfiles or similar would be much faster. Could you upload an example? It might be possible to improve the speed if all lines have the same length or similar. – Daniel Dec 21 '14 at 22:22
  • I added the example of one file. it is very likely that the number of characters of the last entry stays constant. could that help? – Mario Krenn Dec 21 '14 at 22:47
  • Have you tried `sizeA=100000` only? There are already 3 values to read specified in your reading format, I don't think you need to repeat this information in `fscanf`. Otherwise you may be reading more values than you need, and forcing Matlab to organize them in a non matlab-optimized way. – Hoki Dec 22 '14 at 10:02

2 Answers2

2

You could use textscan here for reading first N entries, which is supposedly pretty fast in latest versions of MATLAB -

fid = fopen(inputfile);             %// inputfile is the path to the input text file  
C = textscan(fid, '%d %d %d64',N);  %// N is the number of first entries to be read
fclose(fid);

%// Get data into three separate variables (if needed)
col1 = C{:,1};
col2 = C{:,2};
col3 = C{:,3};
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Nice, that gives me a four-fold speed-up. When my method with fscanf takes 28sec, your method takes 7sec. Thanks! (I'm waiting another few days, maybe somebody comes up with an even faster method. If not, I'll accept your answer of course.) – Mario Krenn Dec 22 '14 at 15:16
  • @NicoDean Wow, that's really awesome!! It's fine to wait, no hurries there. – Divakar Dec 22 '14 at 15:17
0
% To read all the rows and columns
T = dlmread('file1.txt',' ');

% To read specific rows and columns
% R1 - First row to read 
% C1 - First column to read 
% R2 - First row to read 
% C2 - First column to read 
% First row or column index is 0

% Following is the code to read rows 3, 4 and 5 
T = dlmread('file1.txt',' ',[2 0 4 2]);

By default it will read as double.

To get integer values

A = uint8(T(:,1));
B = uint8(T(:,2));
C = uint64(T(:,3));

Hope this helps :)

arccoder
  • 57
  • 5
  • Do you have any idea whether dlmread is faster than fscanf (the code I gave above)? That would be the great thing. and, as I mentioned, the third entry is long-double. – Mario Krenn Dec 21 '14 at 23:31
  • dlmread is not faster than fscanf. But it would be useful for reading N lines not only the first N lines. BTW I did not find long-double datatype but the double reads the data correctly in the example file you gave above. – arccoder Dec 22 '14 at 00:27
  • OK, so my question was on a faster solution than my code above. (reading N lines can maybe also be acchieved with `frewind` or some pointer changer) – Mario Krenn Dec 22 '14 at 00:43