How do I import a file with arbitrary format into matlab?

Question

I'm trying to import a large number of datasheets in Matlab, but the sheets are in an odd format that is unable to be read by the common functions (e.g. load, importdata). I've attempted to read this in with textscan, but have not been very successful. The data structure I'm trying to import is below:

#DATE Thu Oct 13 19:42:07 EDT 2016
#PATIENT_ID                                                                                                                             
#FILE   REGION  MODEL   vB  COV K1  COV k2  COV k3  COV k4  COV Vs  COV Vt  COV K1/k2   COV k3/k4   COV Flux    COV DOF SumSquared  ChiSquare   AIC SC  MSC R2  Sy.x    Runs test   AUC
#UNITS          1-Jan   %   ml/ccm/min  %   1/min   %   1/min   %   1/min   %   ml/ccm  %   ml/ccm  %   ml/ccm  %   1-Jan   %   ml/ccm/min  %   1-Jan   1-Jan   1-Jan   1-Jan   1-Jan   1-Jan   1-Jan   1-Jan   1-Jan   1-Jan
/autofs/eris/jmhgp/users/DanA/New_CLBP_recons/Dynamic_SUV_TACs/2TC_modeling/2TC_modeling_blood-brain_TAC/AIF_masamune/ALS/ALS-701-001_L3Exp_metcor_SUV.bld  thalamus    2 Tissue Compartments   0.05    --- 0.094663815 7.652798652 0.108092863 34.14303115 0.02227185  124.8691372 0.02043199  149.0508534 0.954624576 65.64535418 1.830388363 40.04703162 0.875763787 28.63844043 1.090048014 64.4379452  0.016172615 83.22681328 24  1.352503833 13.50199461 77.56322881 81.89204686 0.298863801 0.442659038 0.237390662 1   5538.027175

There are four separate blocks of this data stacked on top of each other (I've only shown the first "block" here). Ideally, I'd like to extract each element (separated with a space) of the last three columns to a separate cell. So far, I've tried something like:

fid = fopen('myFile.arb');
tmp = textscan(fid, '%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s', 'delimiter', '\n');
fclose(fid);

Which quite clearly does not work. I get all the data, but in difficult to manipulate long character strings. I suppose I don't quite understand exactly how textscan works to extract data, and am probably not thinking about this problem the right way. Any help or pointers will be greatly appreciated. Thanks!

Dan

Why are you specifying `'\n'` as the delimiter when your delimiter is a space? `'\n'` is a newline character. You're also missing two `'%s'` characters. I'd recommend setting `'MultipleDelimsAsOne'` to `true` in the `textscan` call since you have data separated by more than one space. And using properly delimited data... — sco1, Oct 14 '16 at 18:54
Is it possible to re-output the data in a properly-delimited format? — Ian Riley, Oct 14 '16 at 20:36
Thanks for the information, @excaza. I followed your suggestion, adding the missing `'%s'` character and using `'MultipleDelimsAsOne'` instead of `'\n'`. This output the data into cells in a space-delimited fashion, which wasn't the exactly proper format (some single elements in the original sheet had multiple space-delimited words), but it was consistent and certainly easy enough to grab information from this output and import to a new array. — Daniel Albrecht, Oct 19 '16 at 12:52
There really isn't a way to get around the issue with words without changing the delimiter. Having space separated words as a single field in a space delimited document is just inherently flawed. — sco1, Oct 19 '16 at 14:29

How do I import a file with arbitrary format into matlab?

0 Answers0