1

I'm reading fixed-width (9 characters) data from a text file using textscan. Textscan fails at a certain line containing the string:

'   9574865.0E+10  '

I would like to read two numbers from this:

957486 5.0E+10

The problem can be replicated like this:

dat = textscan('   9574865.0E+10  ','%9f %9f','Delimiter','','CollectOutput',true,'ReturnOnError',false);

The following error is returned:

Error using textscan
Mismatch between file and format string.
Trouble reading floating point number from file (row 1u, field 2u) ==> E+10

Surprisingly, if we add a minus, we don't get an error, but a wrong result:

dat = textscan('  -9574865.0E+10  ','%9f %9f','Delimiter','','CollectOutput',true,'ReturnOnError',false);

Now dat{1} is:

    -9574865           0

Obviously, I need both cases to work. My current workaround is to add commas between the fields and use commas as a delimiter in textscan, but that's slow and not a nice solution. Is there any way I can read this string correctly using textscan or another built-in (for performance reasons) MATLAB function?

user1719360
  • 193
  • 6

2 Answers2

0

I suspect textscan first trims leading white space, and then parses the format string. I think this, because if you change yuor format string from

'%9f%9f'

to

'%6f%9f'

your one-liner suddenly works. Also, if you try

'%9s%9s'

you'll see that the first string has its leading whitespace removed (and therefore has 3 characters "too many"), but for some reason, the last string keeps its trailing whitespace.

Obviously, this means you'd have to know exactly how many digits there are in both numbers. I'm guessing this is not desirable.

A workaround could be something like the following:

% Split string on the "dot"
dat = textscan(<your data>,'%9s%9s',...
    'Delimiter'     , '.',...
    'CollectOutput' , true,...
    'ReturnOnError' , false);

% Correct the strings; move the last digit of the first string to the 
% front of the second string, and put the dot back
dat = cellfun(@(x,y) str2double({y(1:end-1),  [y(end) '.' x]}),  dat{1}(:,2), dat{1}(:,1), 'UniformOutput', false);

% Cast to regular array
dat  = cat(1, dat{:})
Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
  • Yes, it first trims, that exactly is my problem. %6f is not a solution, I need to convert all first 9 characters to a number. There are other lines where all 9 characters are used. – user1719360 Jun 19 '13 at 09:48
  • @user1719360: See my latest edit. Can you give that a try on your data? – Rody Oldenhuis Jun 19 '13 at 09:49
  • This works on my string, but each 9 character field can contain any string that will evaluate to a valid number. Will this work in all possible cases? – user1719360 Jun 19 '13 at 09:58
  • @user1719360: The split is done with the "dot" as the delimiter, and only one character is moved from array to array in the correction. So it is pretty specific; it only works on cases where the second number has format `[0-9].[0-9]E[+-][0-9]*` – Rody Oldenhuis Jun 19 '13 at 10:25
0

I had a similar problem and solved it by calling textscan twice, which proved to be way faster than cellfun or str2double and will work with any input that can be interpreted by Matlab's '%f'

In your case I would first call textscan with only string arguments and Whitespace = '' to correctly define the width of the fields.

data = '   9574865.0E+10  ';
tmp = textscan(data, '%9s %9s', 'Whitespace', '');

Now you need to interweave and append a delimiter that won't interfere with your data, for example ;

tmp = [char(join([tmp{:}],';',2)) ';'];

And now you can apply the right format to your data by calling textscan again with a delimiter like:

result = textscan(tmp, '%f %f', 'Delimiter', ';', 'CollectOutput', true);
format shortE
result{:}

ans =

9.5749e+05   5.0000e+10

Comparing the speed of this approach with str2double:

n = 50000;
data = repmat('   9574865.0E+10  ', n, 1);
% Approach 1 with str2double
tic
tmp = textscan(data', '%9s %9s', 'Whitespace', '');
result1 = str2double([tmp{:}]);
toc

Elapsed time is 2.435376 seconds.

% Approach 2 with double textscan
tic
tmp = textscan(data', '%9s %9s', 'Whitespace', '');
tmp = [char(join([tmp{:}],';',2)) char(59)*ones(n,1)]; % char(59) is just ';'
result2 = cell2mat(textscan(tmp', '%f %f', 'Delimiter', ';', 'CollectOutput', true));
toc

Elapsed time is 0.098833 seconds.
JMC
  • 1