1

I'm trying to import data from a text file using textscan. The data has a couple delimiters(colon and space). I'd like to import the data into a matrix that will have 137 columns. Below is two lines of the data and the format it is in.

2 id:1 1:3 2:3 3:0 4:0 5:3 6:1 7:1 8:0 9:0 10:1 11:156 12:4 13:0 14:7 15:167 16:6.931275 17:22.076928 18:19.673353...134:1 135:0 136:2
9 id:2 1:4 2:3 3:1 4:5 5:3 6:4 7:2 8:0 9:0 10:1 11:16 12:42 13:0 14:7 15:167 16:5.7 17:1 18:3...134:2 135:6 136:3

There are 50 lines like this so in the end I would like a 50 x 136 matrix. I'd like to grab the value after the colon and before the space, starting with 1 (1:3 and 1:4) and going to 136 (136:2 and 136:3). Below is the code I'm trying. I've been trying to tweet some code I found while doing some research. I've been reading the specs on repmat and it seems like this will only produce a 1 x 136 matrix.

fid = fopen('./train.txt','r');
fmt = ['%f' repmat('%*f:%f', 1, 136)];
c = textscan(fid, fmt, 'CollectOutput', 1)

Thanks in advance and any help is greatly appreciated.

user2743
  • 1,423
  • 3
  • 22
  • 34

1 Answers1

1

With a small modification to your fmt I think this works:

fmt = ['%f %s' repmat('%*d:%f', 1, 136)]

I added the id as a string, and the number before the colon as an integer (although that doesn't seem to be necessary). Then use c{1} to get the first number in each row, and c{3} to access the matrix of the other values.

David
  • 8,449
  • 1
  • 22
  • 32