0

I am working with huge rich text data files (.rtf). The data within the files consists of two columns of numbers formatted with table-like properties. Also, these numbers are either very large or very small so there needs to be a very high level of precision associated with these numbers.

How do I assign the data of the first column to "A" and the second column to "B"?(Would these be vectors?) My problem at the moment has to do with the fact that the rich text formatting doesn't cooperate with an import into MatLab and converting the .rtf file to .txt (then importing) merges the data of both columns into a single column of alternating information.

Once I have "A", I need to be able to compare a single specified value and compare it to the first column of data, find the closest value, and then yield the corresponding value in the second column.

So say I had this sample of data in my file:

1.0E-5      78.29777
1.0625E-5   75.9674
1.125E-5    73.83424
1.1875E-5   71.87197
1.25E-5     70.05895
1.375E-5    66.8116
1.5E-5      63.9797
1.625E-5    61.48167

And my single specified value was 1.123E-5, this value is closest to 1.125E-5 therefore the desired output is 73.83424.

How can I do this, I don't know where to start as I am unfamiliar with the MatLab data import syntax?

Thanks for all help in advance!!

Sterling Butters
  • 1,024
  • 3
  • 20
  • 41

2 Answers2

1

You can use low level IO with regular expressions to read in your *.rtf file and get your data out without any conversion. Using your sample data and an *.rtf file I kludged together a clunky parser that gets your data out for you. If you open your *.rtf file in a text editor you'll notice (at least in mine) it has 2 header lines:

{\rtf1\ansi\ansicpg1252\deff0\nouicompat\deflang1033{\fonttbl{\f0\fnil\fcharset0 Calibri;}}
{\*\generator Riched20 6.3.9600}\viewkind4\uc1 

Followed by a bit more header that's mixed with your data (could just be a wordpad fail):

\pard\sa200\sl276\slmult1\f0\fs22\lang9 1.0E-5      78.29777\par

So we skip the first two lines, treat the third line differently, and then handle the rest:

fID = fopen('test.rtf', 'r'); % Open our data file

nheaders = 2; % Number of full header lines
npartialheaders = 1; % Number of header lines with your data mixed in

ii = 1;
mydata = [];
while ~feof(fID) % Loop until we reach the end of the file
    if ii <= nheaders
        % Do nothing
        tline = fgetl(fID); % Read in a line of data, discard it
        ii = ii + 1;
    else
        tline = fgetl(fID); % Read in a line of data
        out = regexp(tline, '([\s\d.E-])', 'match');

        if ~isempty(out) % Our regex found some data
            % The regexp returns every character in a cell, concatenate them
            % and split them along the spaces
            data_str = strsplit([out{:}], ' ');

            if ii > nheaders && ii <= (nheaders + npartialheaders)
                % Header is mixed with your data
                % We should only want the second and third matches
                data_num = str2double(data_str(2:3));
                mydata = [mydata; data_num];
            else
                % Just your data on these lines
                data_num = str2double(data_str(1:2));
                mydata = [mydata; data_num];
            end
        end

        ii = ii + 1;
    end
end

fclose(fID);

Which returns:

mydata =

    1.00000000000000e-05    78.2977700000000
    1.06250000000000e-05    75.9674000000000
    1.12500000000000e-05    73.8342400000000
    1.18750000000000e-05    71.8719700000000
    1.25000000000000e-05    70.0589500000000
    1.37500000000000e-05    66.8116000000000
    1.50000000000000e-05    63.9797000000000
    1.62500000000000e-05    61.4816700000000

Admittedly, this is ugly, inefficient code. I'm sure there can be a lot of changes made to make it more robust and efficient, but it should help get you started.

Now that you have your data I think you can work at figuring out your second part. If you haven't already, take a look at MATLAB's matrix indexing documentation. As a hint for one implementation, take a look at the outputs for min and think about what you could do subtracting a constant from a vector.

% What is this doing? It's a mystery! [~, matchidx] = min(abs(mydata(:,1) - querypoint)); disp(mydata(matchidx, 2))

sco1
  • 12,154
  • 5
  • 26
  • 48
0

Here's what I would do: copy the contents into excel or Google spreadsheet, then save as .csv, from here it's easy

T = readtable('path/to/my/data.csv');

T now contains your numbers as double floats as a Table data type.

A = T{:, 1}; % column 1

B = T{:, 2}; % column 2

Good luck!