I'm trying to import data from a text file using textscan. The data has a couple delimiters(colon and space). I'd like to import the data into a matrix that will have 137 columns. Below is two lines of the data and the format it is in.
2 id:1 1:3 2:3 3:0 4:0 5:3 6:1 7:1 8:0 9:0 10:1 11:156 12:4 13:0 14:7 15:167 16:6.931275 17:22.076928 18:19.673353...134:1 135:0 136:2
9 id:2 1:4 2:3 3:1 4:5 5:3 6:4 7:2 8:0 9:0 10:1 11:16 12:42 13:0 14:7 15:167 16:5.7 17:1 18:3...134:2 135:6 136:3
There are 50 lines like this so in the end I would like a 50 x 136 matrix. I'd like to grab the value after the colon and before the space, starting with 1 (1:3
and 1:4
) and going to 136 (136:2
and 136:3
). Below is the code I'm trying. I've been trying to tweet some code I found while doing some research. I've been reading the specs on repmat and it seems like this will only produce a 1 x 136 matrix.
fid = fopen('./train.txt','r');
fmt = ['%f' repmat('%*f:%f', 1, 136)];
c = textscan(fid, fmt, 'CollectOutput', 1)
Thanks in advance and any help is greatly appreciated.