0

I did a program that takes a .csv file and stacks the 3rd column of each files in the appropriate 3rd dimension of a 512x512xNumberOfFiles cell array. The code goes like this :

[filenames,filepath] = uigetfile('*.csv','Opening the data files','','Multiselect','on');
filenames = fullfile(filepath,filenames);
NumFiles = numel(filenames);

Pixel = cell(512,512,NumFiles);

count=0;
num_pixels = size(Pixel,1)*size(Pixel,2);
for k = 1:NumFiles
    fid = fopen(char(filenames(k)));
    C = textscan(fid, '%d, %d, %d','HeaderLines',1);
    Pixel(count + sub2ind(size(Pixel),C{1}+1,C{2}+1)) = num2cell(C{3});
    count = count + num_pixels;
    fclose(fid);
end

The textscan call here takes approximately 0.5 +/- 0.03s per file I open (which is 262144 (512x512) data long), and my sub2ind call takes approximately 0.2 +/- 0.01s per file.

Is there any way to decrease this time or this seems like the most optimal way to run the code? I'll be working with approximately 1000 files each time, so waiting 8-9 minutes only to get the data right seems a bit excessive (considering I haven't used it yet for anything else).

Any tips?

Marc-Olivier

Vissenbot
  • 227
  • 2
  • 5
  • 15
  • Let me ask you, do you have 512x512 elements in all of those files and do all of them have the same row and column pattern for the first two columns of the comma-delimited text files? – Divakar May 14 '14 at 18:20
  • Yes, they are all in the same pattern, and they all have 512x512 elements. It's all in the format `Row, Column, Intensity`. – Vissenbot May 14 '14 at 18:23
  • Is there any particular reason why you're using a cell array? – sco1 May 14 '14 at 18:24
  • Not really. I thought it looked natural to put it in a cell array considering I had 512x512x1000 elements, but maybe I'm misunderstanding something. – Vissenbot May 14 '14 at 18:42

1 Answers1

1

Hoping this would result in some improvement by still keeping it with textscan. Also, make sure the values look good.

Code

[filenames,filepath] = uigetfile('*.csv','Opening the data files',...
    '','Multiselect','on');
filenames = fullfile(filepath,filenames);
NumFiles = numel(filenames);

PixelDouble = NaN(512*512,NumFiles);
for k = 1:NumFiles
    fid = fopen(char(filenames(k)));
    C = textscan(fid, '%d, %d, %d','HeaderLines',1);
    PixelDouble(:,k) = C{3};
    fclose(fid);
end
Pixel = num2cell(permute(reshape(PixelDouble,512,512,[]),[2 1 3]))

I must encourage you to follow this question - Fastest Matlab file reading? and it's answers.

Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • You're saying I could save a lot of time using `sscanf` instead of using `textscan`? As I see it, it seems to be approximately the same for one of the other... – Vissenbot May 14 '14 at 18:51
  • Well it looks like `textscan` is quite fast. Just curious how much improvement you would see though! – Divakar May 14 '14 at 19:10
  • 1
    Your solution cut the sub2ind part, evidently. Since I do not need `C{1} and C{2}`, I just typed `C = textscan(fid, '%*d, %*d, %d','HeaderLines',1);` instead, but it saved only a few miliseconds (still a few seconds when you have a thousand files!) Thanks for the help! – Vissenbot May 14 '14 at 19:43