3

I have a .txt file that has been generated from SQL-2005 (in ANSI format). I have tried textscan and fscanf. The entire txt file has only numeric data.

Online resources suggest that fscanf is FASTER than textscan but I found it otherwise.

  • Textscan was much faster than fscanf

I want to try this with fread as well but I do not know how to import data using fread. Can you please suggest/comment? Thanks.

fName     = 'Test.txt'    % From SQL in ANSI format, 5million rows, 5 Cols
Numofrows = 1000000 ; %1million
Numcols   = 5 ;

fid = fopen(fName, 'r');
C   = textscan(fid, '%f %f %f %f %f', Numofrows ) ;
C   = cell2mat(C);

fclose(fid); fid = fopen(fName, 'r');
[C, Count] = fscanf(fid, '%f %f %f %f %f', Numofrows * Numcols ) ;
C = reshape(C, Count./Numofrows , Numofrows ) ; C=C';
Andrey Rubshtein
  • 20,795
  • 11
  • 69
  • 104
Maddy
  • 2,520
  • 14
  • 44
  • 64

2 Answers2

0

Ideally you would be able to get your data into a binary format and then use fread to directly read double precision number in. I would expect fread to be a lot faster in that case. (String-to-number conversions are expensive, and a raw binary format will result in a much smaller file).

Otherwise you can read characters using fread and then run a string-to-number conversion on the incoming data (sscanf seems to be the best). The only trick is that you need to get your read batches to end on a line break, otherwise your text-to-string operation is likely to give unpredictable results. You can do that be first reading a large batch of characters, then either backing up until you reach a line break, or reading in additional characters until you find the end of the line. I have found this is slightly faster than either textscan of fscanf ... but our numbers do not match for other reasons; I'm not sure what to believe.

Example code of the second method is included in a previous answer (including a lot of overlap with this question), as well as some timing results. https://stackoverflow.com/a/9441839/931379.

Community
  • 1
  • 1
Pursuit
  • 12,285
  • 1
  • 25
  • 41
  • --> I tried implementing a sol. by getting the data in string format and then made it numerical (http://stackoverflow.com/questions/8841490/how-to-use-matlab-fread-to-read-a-txt-file). The problem js it doesn't let me READ just say 100 rows at a time because of *char conversion. – Maddy Mar 03 '12 at 06:40
  • --> textscan was taking 14 seconds for an operation that took almost 30seconds using fscanf. The data is random_numbers 5million rows, 5 cols, 1million rows being read at any time. Textscan was reading the input as cell and so I started investigating fscanf but found slower performance. – Maddy Mar 03 '12 at 06:42
  • Answer updated as I remember my file reading basics. Sample code to use fread to read a round number of lines in included in the prev. answer. It's a bit hacky, but slightly faster than textscan ot fscanf for the case I tested. – Pursuit Mar 03 '12 at 07:15
0

There is another option that you did not list: load

   L = load(fName);

It is very simple, and will figure out the format automatically for you. It does have some limitations - The format should have same amount of numbers in each line.

Andrey Rubshtein
  • 20,795
  • 11
  • 69
  • 104