2

I'm working on an assignment where I have to read a tab delimited text file and my output has to be a matlab structure.

The contents of the file look like this (It is a bit messy but you get the picture). The actual file contains 500 genes (the rows starting at Analyte 1) and 204 samples (the columns starting at A2)

#1.2                                    
500 204                             
Name        Desc        A2  B2  C2  D2  E2  F2  G2  H2
Analyte 1   Analyte 1   978 903 1060    786 736 649 657 733.5
Analyte 2   Analyte 2   995 921 995.5   840 864.5   757 739 852
Analyte 3   Analyte 3   1445.5  1556.5  1579    1147.5  1249    1069.5  1048    1235
Analyte 4   Analyte 4   1550    1371    1449    1127    1196    1337    1167    1359
Analyte 5   Analyte 5   2074    1776    1960    1653    1544    1464    1338    1706
Analyte 6   Analyte 6   2667    2416.5  2601    2257    2258    2144    2173.5  2348
Analyte 7   Analyte 7   3381.5  3013.5  3353    3099.5  2763    2692    2774    2995

My code is as follows:

fid = fopen('gene_expr_500x204.gct', 'r');%Open the given file

% Skip the first line and determine the number or rows and number of samples
dims = textscan(fid, '%d', 2, 'HeaderLines', 1);
ncols = dims{1}(2);

% Now read the variable names
varnames = textscan(fid, '%s', 2 + ncols);
varnames = varnames{1};

% Now create the format spec for your data (2 strings and the rest floats)
spec = ['%s%s', repmat('%f', [1 ncols])];

% Read in all of the data using this custom format specifier. The delimiter     will be a tab
data = textscan(fid, spec, 'Delimiter', '\t');

% Place the data into a struct where the variable names are the fieldnames
ge = data{3:ncols+2}
S = struct('gn', data{1}, 'gd', data{2}, 'sid', {varnames});

The part about ge is my current attempt but its not really working. Any help would be very appreciated, thank you in advance!!

Suever
  • 64,497
  • 14
  • 82
  • 101
embryo3699
  • 55
  • 7

1 Answers1

2

A struct field can hold any datatype including a multi-dimensional array or matrix.

Your issue is that data{3:ncols+2} creates a comma-separated list. Since you only have one output on the left side of the assignment, ge will only hold the last column's value. You need to use cat to concatenate all of the columns into a big matrix.

ge = cat(2, data{3:end});

% Or you can do this implicitly with []
% ge = [data{3:end}];

Then you can pass this value to the struct constructor

S = struct('gn', data(1), 'gd', data(2), 'sid', {varnames}, 'ge', ge);
Suever
  • 64,497
  • 14
  • 82
  • 101