2

I try to import data from a text file to MATLAB, which has the following structure:

** Porosity
**
*POR *ALL
0.1500 0.0900 2*0.1300 0.1400 4*0.1500 0.2200 2*0.1500 0.0500
0.0900 0.1400 5*0.1500 0.2300 0.2600 0.0800 0.1500 0.1500 0.2400 0.1700
[...]

The header has to be ignored obviously. Space is the delimiter, while * indicates that the same value occurs several times as indicated by the integer before the *.

Unfortunately, the number of entries per line varies. Ideally I want to store all values in one array like this:

por = [0.1500 0.0900 0.1300 0.1300 0.1400 0.1500 0.1500 0.1500 0.1500 0.1500 0.2200 0.1500 0.1500 ...]

Can this be solved with the textscan command somehow? The file is rather large with some hundred thousand values, so I need a quick solution ;) Help is greatly appreciated!

Hoki
  • 11,637
  • 1
  • 24
  • 43
  • Do you have any code you've found or tried so far that we may be able to help with? This will help get a speedy, specific solution. – Jimmy Smith Oct 01 '14 at 18:08
  • Yes, you can use textscan. I would read it in using space as a delimiter, and then use regexp to find all the repeated numbers. I could put up some pseudo code, but it would be better if you posted some code from your own attempts for us to help with. – Trogdor Oct 01 '14 at 19:18
  • I had slow solution for another file, but in that case only the multipliers '2*' and '4*' occurred and the values made up a nicely defined matrix with consistent number of entries per row. I loaded the file with Excel first, saved it as an .xlsx and loaded it with xlsread. My code would then run for- and if-loops to do the job. This was a rather laborious and inefficient solution to the problem. Also, now I cannot predict the multipliers, so I cannot adapt my code to the new file. I am not familiar with the textscan command, but browsing trough other posts I figured, it might be a solution. – Daniel Düsentrieb Oct 01 '14 at 19:18
  • start with `fid = fopen('file'); data = textscan(fid,'%s','delimiter',' '); fclose(fid)`. then you will have to examine each cell in `data` for `*` and build your numerical array – Trogdor Oct 01 '14 at 20:31

1 Answers1

0

Straight forward way (I did not use Matlab for a long period of time, so it might be not the best solution)

fid = fopen('temp.txt');
data = textscan(fid, '%s', 'delimiter', ' ');
fclose(fid);

out = convert_cells(data);

And function

function out = convert_cells(cells)
  out = [];
  for i = 1 : size(cells{1})
     tmp = strsplit(cells{1}{i}, '*');
     num1 = str2double(tmp(1));
     if size(tmp, 2) == 2 && ~isnan(num1)
         num2 = str2double(tmp(2));
         if ~isnan(num2)
             out = [out repmat(num2, 1, num1)];
         end;
     elseif size(tmp, 2) == 1 && ~isnan(num1)
         out(end + 1) = num1;
     end;
  end;
end
Cheery
  • 16,063
  • 42
  • 57
  • This works like a charm! Thank you. Two question though: Why do you check whether num1 is a number or not? To eliminate the header? Secondly, to ignore the header, I assume I can just start the for-loop at a higher i (in my case 6), right? – Daniel Düsentrieb Oct 02 '14 at 11:26
  • @DanielDüsentrieb Yes, it checks that we have a number there. You can skip, if you want. – Cheery Oct 02 '14 at 15:56