Since you said "Numerical matrix padded with zeros would be good", there is a solution using textscan
which can give you that. The catch however is you have to know the maximum number of element a line can have (i.e. the longest line in your file).
Provided you know that, then a combination of the additional parameters for textscan
allow you to read an incomplete line:
If you set the parameter 'EndOfLine','\r\n'
, the documentation explains:
If there are missing values and an end-of-line sequence at the end of
the last line in a file, then textscan returns empty values for those
fields. This ensures that individual cells in output cell array, C,
are the same size.
So with the example data in your question saved as differentRows.txt
, the following code:
% be sure about this, better to overestimate than underestimate
maxNumberOfElementPerLine = 10 ;
% build a reading format which can accomodate the longest line
readFormat = repmat('%f',1,maxNumberOfElementPerLine) ;
fidcsv = fopen('differentRows.txt','r') ;
M = textscan( fidcsv , readFormat , Inf ,...
'delimiter',',',...
'EndOfLine','\r\n',...
'CollectOutput',true) ;
fclose(fidcsv) ;
M = cell2mat(M) ; % convert to numerical matrix
will return:
>> M
M =
1 0 1 0 1 NaN NaN NaN NaN NaN
1 0 1 0 1 0 1 0 1 NaN
1 0 1 NaN NaN NaN NaN NaN NaN NaN
1 0 1 NaN NaN NaN NaN NaN NaN NaN
1 0 1 0 1 NaN NaN NaN NaN NaN
0 1 0 1 0 1 0 1 0 NaN
As an alternative, if it makes a significant speed difference, you could import your data into integers instead of double. The trouble with that is NaN
is not defined for integers, so you have 2 options:
- 1) Leave the empty entries to the default
0
just replace the line which define the format specifier with:
% build a reading format which can accomodate the longest line
readFormat = repmat('%d',1,maxNumberOfElementPerLine) ;
This will return:
>> M
M =
1 0 1 0 1 0 0 0 0 0
1 0 1 0 1 0 1 0 1 0
1 0 1 0 0 0 0 0 0 0
1 0 1 0 0 0 0 0 0 0
1 0 1 0 1 0 0 0 0 0
0 1 0 1 0 1 0 1 0 0
- 2) Replace the empty entries with a placeholder (for ex:
99
)
Define a value which you are sure you'll never have in your original data (for quick identification of empty cells), then use the EmptyValue
parameter of the textscan
function:
readFormat = repmat('%d',1,maxNumberOfElementPerLine) ;
DefaultEmptyValue = 99 ; % placeholder for "empty values"
fidcsv = fopen('differentRows.txt','r') ;
M = textscan( fidcsv , readFormat , Inf ,...
'delimiter',',',...
'EndOfLine','\r\n',...
'CollectOutput',true,...
'EmptyValue',DefaultEmptyValue) ;
will yield:
>> M
M =
1 0 1 0 1 99 99 99 99 99
1 0 1 0 1 0 1 0 1 99
1 0 1 99 99 99 99 99 99 99
1 0 1 99 99 99 99 99 99 99
1 0 1 0 1 99 99 99 99 99
0 1 0 1 0 1 0 1 0 99