1

I have a large number of csv files to be processed. I only want the selected columns in each file and then load all the files from a certain folder and then output as one combined file. Here are my codes running with errors.... Could anyone help me to solve this problem?

data_directory = 'C:\Users\...\data';
numfiles = 17;
for n = 1:numfiles
    filepath = [data_directory,'data_', num2str(n),'_output.csv'];
    fid = fopen (filepath, 'rt');
    wanted_columns= [2 3 4 5 10 11 12 13 14 15 16 17 35 36 41 42 44 45 59 61];
    format = [];
    columns = 109;
for i = 1 : columns;
    if any (i == wanted_columns)
        format = [format '%s'];
    else
        format = [format '%*s'];
    end
end
    data = textscan(fid, format, 'Delimiter',',','HeaderLines',1);
    fclose(fid);
end
Daniel
  • 10,864
  • 22
  • 84
  • 115
Jackie
  • 73
  • 1
  • 10
  • What are the errors that you get? – ThijsW Feb 26 '13 at 00:08
  • @ThijsW The errors are:??? Error using ==> textscan Invalid file identifier. Use fopen to generate a valid file identifier. Error in ==> data_import_fail at 16 data = textscan(fid, format, 'Delimiter',',','HeaderLines',1); – Jackie Feb 26 '13 at 01:22

2 Answers2

0

I think you should check whether the file is opened correctly. The error message seems to indicate that this is not the case. If it is not, check if the filepath is correct.

fid = fopen (filepath, 'rt');
if fid == -1
    error('Failed to open file');
end

If the error is thrown here, you know that there was a problem with 'fopen'.

Ofcourse I don't know which files are on your computer, but I assume the '...' in the filename is not in your actual matlab file, only in your question on SO. But could it be that you repeat the word 'data', while the actual filename only contains 'data' once? You code now will result in filenames like ''C:\Users\...\datadata_1_output.csv'. Maybe 'data' should be removed in data_directory or in filepath = ...?

ThijsW
  • 2,599
  • 15
  • 17
  • Hi, this is very helpful. I changed all the repeat words that may cause confusion and it works now.Thank you very much!!!--Jackie – Jackie Feb 26 '13 at 03:56
  • Could i ask one more question about how to continue outputting my dataset as one file? Thanks for you help. – Jackie Feb 26 '13 at 04:16
  • Didn't see your comment before. Ask away. Probably best to create a new question on this website, to keep things nice and tidy. – ThijsW Feb 26 '13 at 07:29
0

Here is another way how you can setup the format string in a vectorized manner:

fcell = repmat({'%*s '},1,n_columns);
fcell(wanted_columns) = {'%s '};
formatstr = [fcell{:}];

Notice format is a build-in function in MATLAB, and it's better not to be used for variable name.

yuk
  • 19,098
  • 13
  • 68
  • 99