0

I have a tab-separated file like this:

refseq  gene    symb    locus_id        chr     strand  start   end     cds_start       cds_end status  chrm
ENST00000456328.2       ENST00000456328.2       DDX11L1 00000456328     chr1    1       11868   14409   14409   14409   Reviewed        1
ENST00000515242.2       ENST00000515242.2       DDX11L1 00000515242     chr1    1       11871   14412   14412   14412   Reviewed        1
ENST00000518655.2       ENST00000518655.2       DDX11L1 00000518655     chr1    1       11873   14409   14409   14409   Reviewed        1
ENST00000450305.2       ENST00000450305.2       DDX11L1 00000450305     chr1    1       12009   13670   13670   13670   Reviewed        1
ENST00000438504.2       ENST00000438504.2       WASH7P  00000438504     chr1    0       14362   29370   29370   29370   Reviewed        1

I'd like to read it into Matlab as a struct like this: enter image description here

I tried to do it like this:

fid = fopen('gencode.v19.pseudogene_gistic.txt');
headers = textscan(fid,'%s%s%s%s%s%s%s%s%s%s%s%s',1,'delimiter','\t')
data = textscan(fid,'%s%s%s%d%s%d%d%d%d%d%s%d','delimiter','\t')
fclose(fid);
cdata = struct('refseq',data{1}, 'gene',data{2}, 'symb',data{3}, 'locus_id',data{4}, 'chr',data{5}, 'strand',data{6}, 'start',transpose(data{7}), 'end',data{8}, 'cds_start',data{9}, 'cds_end',data{10}, 'status',data{11}, 'chrn',data{12};

However, it returns such a structure contains ridiculous cells. All the number fields act differently.NOTE: I want a 1x17149 struct, instead of 17149x1 struct.

Anyone could help? Thanks.

enter image description here

Suever
  • 64,497
  • 14
  • 82
  • 101
tsznxyz
  • 199
  • 1
  • 7

1 Answers1

2

The issue is that textscan returns a cell array of numbers for numeric values and a cell array of cell arrays for character arrays. You need to convert either one or the other so that they are the same.

Here is some code that works on the data you've shown

%// Load the headers
headers = textscan(fid,'%s%s%s%s%s%s%s%s%s%s%s%s', 1);

%// Load the data
data = textscan(fid,'%s%s%s%d%s%d%d%d%d%d%s%d');

%// Find which ones aren't nested cell arrays
isarray = ~cellfun(@(x)iscell(x), data);

%// Convert to nested cell arrays
data(isarray) = cellfun(@num2cell, data(isarray), 'uni', 0);

%// Create a structure using the headers as field names
cdata = cell2struct(cat(2, data{:}).', [headers{:}]).';
Suever
  • 64,497
  • 14
  • 82
  • 101