10

I have a text file which contains binary data in the following manner:

00000000000000000000000000000000001011111111111111111111111111111111111111111111111111111111110000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111000111100000000000000000000000000000000
00000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000011111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111000111110000000000000000000000000000000
00000000000000000000000000000000000000111111111111111111111111111111111111111111111111111111110000000000000000000000000000000
00000000000000000000000000000000000000000000111111111111111111111111111111111111110000000011100000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111111100111110000000000000000000000000000000
00000000000000000000000000000000000111111111111111111111111111111111111111111111111111110111110000000000000000000000000000000
00000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111100000000000000000000000000000000
00000000000000000000000000000000000000001111111111111111111111111111111111111111111111000011100000000000000000000000000000000
00000000000000000000000000000000000000001111111111111111111111111111111111111111111111000011100000000000000000000000000000000
00000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111000000000000000000000000000000000
00000000000000000000000000000000000000011111111111111111111111111111111111111111111110000011100000000000000000000000000000000
00000000000000000000000000000000000000000000011111111111111111111111111111111111100000000011100000000000000000000000000000000
00000000000000000000000000000000000000111111111111111111111111111111111111111111111111110111100000000000000000000000000000000

Please note that each 1 or 0 is independent i.e the values are not decimal. I need to find the column wise sum of the file. There are 125 columns in all and there are 840946 rows.

I have tried textread, fscanf and a few other matlab commands, but the result is that they all read each row in decimal format and create a 840946x1 array. I want to create a 840946x125 matrix to compute a column wise sum.

Gunther Struyf
  • 11,158
  • 2
  • 34
  • 58
user1716595
  • 151
  • 1
  • 6

2 Answers2

6

You can use textread to do it. Just read strings and later process them with sscanf, one digit at a time

A = textread('data.txt', '%s');
ncols = size(A, 1);
nrows = size(A{1}, 2);
A = reshape(sscanf([A{:}], '%1d'), nrows, ncols);

Note that now A is transposed, i.e. you have 125 rows.

The column-wise sum is then computed simply by

colsum = sum(A);
Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96
angainor
  • 11,760
  • 2
  • 36
  • 56
5

Here's a slightly hack-ish approach:

A = textread('data.txt', '%s');  

colsum = sum(cat(1,A{:})-'0')

Breakdown:

  1. textread will read each line of 0's and 1's as a single string. A will therefore be a cell-string, with each element equal to a string of length 125.
  2. cat(1,A{:}) will concatenate the cell string into a "normal" Matlab character array of size 840946-by-125.
  3. Subtracting the ASCII-value '0' from any character array consisting of 0's and 1's will return their numeric representation. For example, 'a'-0 = 97, the ASCII-value for lower-case 'a'.
  4. sum will finally sum over the columns of this array.
Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96