I'm writing a Matlab script which begins by reading a space delimited .log file into a cell array . Column headers in the file are all strings, but data types throughout the file are mixed, so for simplicity I've been treating every value as a string for now.
This is what I have so far, and it works just fine with small files.
fileID = fopen('file');
ImportData = char.empty; % create empty array to add on to
while ~feof(fileID)
tLines = fgetl(fileID); % reads line into string
raw = strsplit(tLines, ' '); %splits line into array for that line
ImportData = cat(1, ImportData, raw); %adds line to rest of array
end
fclose(fileID);
However the actual files this script will need to read are very unwieldy (30,000+ rows, 200+ columns) and I'm finding this procedure very slow for that. I've done some research and I'm sure that vectorization is the answer, but I'm very unfamiliar in this area.
What are the ways in which I could alter this procedure to dramatically increase speed?
EDIT: Column types are inconsistent, so the importdata
function doesn't work. The file has a .log extension, so the readtable
function doesn't work. Ideally a faster method of using textscan would be perfect.