0

I have data for a large amount of Group IDs, and each group ID has anywhere from 4 to 30 observations. I would like to solve a (linear or nonlinear, depending on approach) system of equations using data in Matlab. I want to solve a system of three equations and three unknowns, but also load in data for known variables. I need observations 2 through 4 in order to solve this, but would also like to move to the next set of 3 observations (if it exists) to see how the solutions change. I would like to record these calculations as well. What is the best way to accomplish this? I have a standard idea of how to solve the system using fsolve, but what is the best way to loop through group IDs with varying amounts of observations?

Here is some sample code I have written when thinking about this issue:

%%Load Data

 Data = readtable('dataset.csv'); % Full Dataset 

 
 %Define Variables
    %Main Data
groupID = Data{:,1}; 


Known1 = Data{:,7}; 


Known2 = Data{:,8}; 


Known3 = Data{:,9}; 


%%%%%%Function %%%%%


f = [A,B,C];
% Define the function handle for the system of equations
fun = @(f) [A^2 + B*Known3 - 2C*Known1 +1/Known2 - D2; 
            A + (B^2)Known3 - C*Known1 +1/Known2 - D3;
            A - B*Known3 + C^2*Known1 +1/Known2 - D4];

% Define the initial guess for the solution
f0 = [0; 0; 0];

% Solve the nonlinear system of equations
f = fsolve(fun, f0)

%%%% Create Loop %%%%%%

% Set the number of observations to load at a time
numObservations = 3;

% Set the initial group ID
groupID = 1;

% Set the maximum number of groups
maxGroups = 100;

% Loop through the groups of data
while groupID <= maxGroups
    % Load the data for the current group
    data = loadData(groupID, numObservations);

    % Update the solution using the new data
    x = fsolve(fun, x);

    % Print the updated solution
    disp(x);

    % Move on to the next group of data
    groupID = groupID + 1;
end

What are the pitfalls with writing the code like this, and how can I improve it?

  • 'fsolve' might not find a solution, so consider adding a check in your loop. You might not be able to process a whole dataset if that function throws an error. – iohans Jan 19 '23 at 04:07
  • How large is the dataset? The code would probably be really slow using the "for". Consider the Parallel Computing Toolbox and parfor to tackle really large datasets. – iohans Jan 19 '23 at 04:11
  • Right now there are about 900K observations. I will add a check in my loop, and look into the parallel computing toolbox. Should I look into anything that would help account for the fact that I do not have a balanced set of observations per group? – Joshua Scott Jan 20 '23 at 15:53

0 Answers0