0

I'm writing a Matlab script which begins by reading a space delimited .log file into a cell array . Column headers in the file are all strings, but data types throughout the file are mixed, so for simplicity I've been treating every value as a string for now.

This is what I have so far, and it works just fine with small files.

fileID = fopen('file');
ImportData = char.empty; % create empty array to add on to

while ~feof(fileID)
    tLines = fgetl(fileID); % reads line into string
    raw = strsplit(tLines, ' '); %splits line into array for that line
    ImportData = cat(1, ImportData, raw); %adds line to rest of array
end

fclose(fileID);

However the actual files this script will need to read are very unwieldy (30,000+ rows, 200+ columns) and I'm finding this procedure very slow for that. I've done some research and I'm sure that vectorization is the answer, but I'm very unfamiliar in this area.

What are the ways in which I could alter this procedure to dramatically increase speed?

EDIT: Column types are inconsistent, so the importdata function doesn't work. The file has a .log extension, so the readtable function doesn't work. Ideally a faster method of using textscan would be perfect.

  • Is the number of columns fixed? – nkjt Jun 29 '15 at 10:26
  • 1
    In case the number of columns (and the type per column of course) is the same for every row, you could use any of matlabs built in file reader functionality. For example [importdata](http://se.mathworks.com/help/matlab/ref/importdata.html), [xlsread](http://se.mathworks.com/help/matlab/ref/xlsread.html) or any working function. You could also use the first line as a template and design a `fscanf`. Further, it is possible to use the [textscan](http://se.mathworks.com/help/matlab/ref/textscan.html) method, where you specify the delimiter. It is hard to give advice without knowing the data format – patrik Jun 29 '15 at 11:53
  • 1
    @CarlWitthoft I think that `ImportData` is a variable. The code provided seems to be c-code like. The suggestion is to rather use the extensive libraries of Matlab. – patrik Jun 30 '15 at 05:04
  • @patrik 'ImportData' is a variable here. The data types aren't consistent throughout the file - some columns are strings and some are numeric. That's why to the best of my knowledge the 'importdata' function wouldn't work. 'textscan' is what I've been using, but my problem is its inefficiency for large files. – BayesianRegret Jun 30 '15 at 06:20
  • 1
    possible duplicate of [Fastest Matlab file reading?](http://stackoverflow.com/questions/9440592/fastest-matlab-file-reading) – rst Jun 30 '15 at 06:50

1 Answers1

0
readtable(filename,'FileType','text','Delimiter',' ')

should work fine. The file extension ".log" is irrelevant as long as your file is delimited with ' '. You can further specify a format string/sequence if you have prior knowledge of column format. Specifying format strings can make the operation a lot quicker. If you don't specify a format then it will return numeric if entire column is numeric or cellstrings if it's mixed.

PHB
  • 1