0

I have this structure in a text file named my_file.txt.

# Codelength = 3.74556 bits.
1:1:1:1 0.000218593 "v12978"
1:1:1:2 0.000153576 "v1666"
1:1:1:3 0.000149092 "v45"
1:1:1:4 0.000100329 "v4618"
1:1:1:5 5.1005e-005 "v5593"
1:1:1:6 3.53112e-005 "v10214"
1:1:1:7 3.36297e-005 "v10389"
1:1:1:8 2.85852e-005 "v2273"
1:1:1:9 2.63433e-005 "v13253"
1:1:1:10 2.41013e-005 "v10109"
1:1:1:11 2.01778e-005 "v9204"
1:1:1:12 1.73753e-005 "v16508"
1:1:1:13 1.34519e-005 "v335"

This is a small part of this text file. Main file has more than 600,000 lines. I want have a array with this properties:

First column : 1 1 1 1 1 1 1 ... (left values in txt file)
Second column : 1 1 1 1 1 1 1 ...
Third column : 1 1 1 1 1 1 1 ...
Fourth column : 1 2 3 4 5 6 ...
Fifth column : 0.000218593 0.000153576 000149092 000100329 ....

and a string containing last right text file items ("v12978", "v1666" ...). How can I do this in MATLAB?

Eghbal
  • 3,892
  • 13
  • 51
  • 112
  • Preallocate a matrix of 600.000x5 elements. Read the file and use colon `:` and space as delimiters to split the lines. Convert each element to numericand store it in the right index in the matrix. –  Nov 22 '16 at 14:15
  • @Sembei Norimaki. Thank you for your comment. Please add your sample code in answers. – Eghbal Nov 22 '16 at 14:16
  • 1
    I'm giving you a tip on how to do it. You are welcome to code it and if it doesn't work put your code and we will see why it doesn't work. –  Nov 22 '16 at 14:18

1 Answers1

1

Suppose that textfile.txt is your data file, then

fid = fopen('textfile.txt', 'r');
oC = onCleanup(@() any(fopen('all')==fid) && fclose(fid) );

data = textscan(fid,...
                '%d:%d:%d:%d %f %q',...
                'Headerlines', 1);

fclose(fid);

will give

data = 
    [13x1 int32]    [13x1 int32]    [13x1 int32]    [13x1 int32]    [13x1 double]    {13x1 cell}

That already fits your description of the desired output format.

Now, you could go on and concatenate the numbers into a single array, where you should take care of the fact that MATLAB downcasts by default:

numbers = cellfun(@double, data(1:end-1), 'UniformOutput', false);            
numbers = [numbers{:}];

but well, that all depends on your specific use case.

You might want to split the reading/processing up in chunks of say, 10,000 lines, because reading 600k lines all at once can eat away your RAM. Read the documentation on textscan how to do this.

Rody Oldenhuis
  • 37,726
  • 7
  • 50
  • 96