2

I'm trying to read a .txt file that is ';' delimited with date in the 'header' and diferent columns after the 'header'. I'm using quotes to HEADER because it's more like a parameter line.

So, the .txt is like (the other lines have the same number of columns):

15/07/2013;66;157 
DDD;3;1;0;1;1;1;-0.565
DDD;8;2;0;2;1;1;-0.345 
DDD;9;3;2;3;1;2;-0.643 
DDD;8;1;3;5;1;3;-0.025 
DDD;8;1;0;9;1;4;-0.411 
DDD;15;1;5;4;1;5;-0.09 
DDD;12;1;0;5;1;6;-0.445 
DDD;13;1;0;7;1;7;-0.064

I want to read and create a matrix, that contains each data in one cell, like:

matrix = 
[15/07/2013 66 157
 DDD 3 1 0 1 1 1 -0,565
 DDD 8 2 0 2 1 1 -0,345
 DDD 9 3 2 3 1 2 -0,643
...]

I've tried textscan, cvsread, textread and nothing works!

Thanks in advance!

Edit: Actually, I found a WAY FASTER code to do this!

Luiz
  • 85
  • 1
  • 3
  • 12

1 Answers1

2

From my past experience, MATLAB does not like strings and numbers to be in the same matrix, so you would be forced to use a cell.

You can do this relatively easily with some simple paring.

fid = fopen('temp.txt','r'); %# open file for reading
count = 1;
content = {};
while ~feof(fid)
    line = strtrim(fgets(fid)); %# read line by line
       parts = regexp(line,';','split');
       for i = 1:numel(parts)
             temp = regexp(parts{i},'-?[0-9]*\.?[0-9]*(i|j)?','match');
             if numel(temp) >= 1 && strcmpi(temp{1},parts{i})
                  parts{i} = str2double(parts{i}) ;
             end
       end
       content{count} = parts;
    count = count + 1;
end
fclose(fid);

numRows  = size(content,2)-1;
whole = cell(numRows,8);
for i = 1:numRows
    for j = 1:8
       whole{i,j} = content{i+1}{j};
    end
end
content = {content{1},whole};

UPDATE

I added some stuff to put everything into a single cell array, all of the data outside of the header. I do not know if you wand the header to also be in that 8 column array, but if you do here is some code to do that

numRows  = size(content,2);
whole = cell(numRows,8);
for i = 1:numRows
    for j = 1:min([size(content{i},2),8])
       whole{i,j} = content{i}{j};
    end
end
whole
MZimmerman6
  • 8,445
  • 10
  • 40
  • 70
  • Wow, I would never think in this! Testing right now. Thanks for your reply! – Luiz Jul 15 '13 at 16:37
  • it should work, I tested it on my end by just copying and pasting the text you posted above. Just remember, when accessing cells you need to use braces, `{}`, not parentheses, `()`. – MZimmerman6 Jul 15 '13 at 16:43
  • Thanks a lot, it worked. The problem is, the resultant is a CELL OF CELLS, and I need a cell with 8 columns and all rows. Is there any way to transform like this? – Luiz Jul 15 '13 at 16:43
  • Yes, like I said, after the 'header' I will always have the same number of columns (8, in this case) – Luiz Jul 15 '13 at 16:44
  • there is however an unknown number of rows right? – MZimmerman6 Jul 15 '13 at 16:45
  • I can check the exact number, but that's not what I desire, but if it's helpful, i can check! – Luiz Jul 15 '13 at 16:47
  • one second, I am working on a solution – MZimmerman6 Jul 15 '13 at 16:51
  • 1
    Take your time, and thanks a lot in advance, you are being very helpful! To discover the amount of rows u can see the "count" variable, actually, is like total_rows = count - 1. Is this helpful? – Luiz Jul 15 '13 at 16:53
  • @MZimmerman6 Your solution has a major performance issue since you haven't preallocated memory for the resulting cell array `content`. – Eitan T Jul 15 '13 at 16:56
  • Eitan T, I understand this is so, but unfortunately we do not know the length of the file beforehand, so I do not have the ability to preallocate, hence the while loop, and reading in the content as a stream – MZimmerman6 Jul 15 '13 at 16:57
  • Only one question, why the code reads the 8th column (number -0.545) as a 'CHAR'? – Luiz Jul 15 '13 at 17:12
  • Just about to fix that now actually, please consider and upvote on my answer too :) – MZimmerman6 Jul 15 '13 at 17:13
  • I would, if I could :/ I cant because I dont have at least 15 reputation! Thats a shame, because this was EXTREMELY helpful – Luiz Jul 15 '13 at 17:15
  • The solution that fixed the char problem was the line `if isstrprop(parts{i},'digit')` being replaced by `if all(ismember(parts{i},'0123456789ij.-'))` which makes sure all characters in a string are of a number format – MZimmerman6 Jul 15 '13 at 17:17
  • Wow, you rock! As soon as I get my 15 reputation, I will get back in here to upvote your answer, thats for sure! – Luiz Jul 15 '13 at 17:18
  • @MZimmerman6 I'd recommend reading then lines as strings using `textscan`, and then parsing them (_i.e_ with `regexp`). But whatever works, right? – Eitan T Jul 15 '13 at 17:22
  • Yeah I just thought about that as well because ismember will cause some problems with poorly formatted numbers, fixed to reflect doing this with regular expressions – MZimmerman6 Jul 15 '13 at 17:29