Linear regression with two datasets which are weighed differently. How to minimize the error?

Question

I am inexperienced with minimization functions and I could not find a solution to my problem. From what I understand I have to have a function to minimize, however the function that I wrote returns errors.

Problem description: I have two datasets, one dataset is weighed more ('k', 70%) than the other ('g', 30%). Together they combine their contents into values. Besides these I also have the 'true' values. I want to optimise the parameters via linear regression to fit the true values the best (minimize the difference). To do this I wrote the following function:

def Q_calc(c1, c2, c3, c4, c5, c6, k1=k1, k2=k2, k3=k3, g1=g1, g2=g2, g3=g3):
    # data    
    m = data['measurements']    

    df = pd.DataFrame(index=np.arange(1,len(data)+1)
    df['error'] = 0
    for i in range(len(data)):
        df['error'].iloc[i] = ((0.7*(c1*k1[i] + c2*k2[i] + c3*k3[i]) + 0.3*(c4*g1[i] + c5*g2[i] + c6*g3[i])) - m.iloc[i])**2
    
    sum_error = np.sqrt(df['error'].sum())
    return sum_error

where: All k values are pd.DataFrame columns of length 438 and m a single column in the same df.

I have tried: I want to optimize the function Q_calc by minimizing the error by adjusting the variables c1-c6. I've tried the following:

import numpy as np
import pandas as pd
from scipy.optimize import minimize 

x0 = [1.05, 0.57, 0.12, 1.28, 0.21, 0.00]
b = (0, np.inf)
bounds = [b, b, b, b, b, b]

k1 = data['k1'].values
k2 = data['k2'].values
k3 = data['k3'].values
g1 = data['g1'].values
g2 = data['g2'].values
g3 = data['g3'].values

minimize(Q_calc, x0, args=(k1, k2, k3, g1, g2, g3), bounds=bounds)

But it gives me the error: "operands could not be broadcast together with shapes (6,) (438,)"

Something must go wrong in my code, I want a single set of constants that lead to the minimized sum of all my errors. How should I approach this?

Which line gives you the error? It's trying to tell you that a (6,) array and a (438,) array can't be used in one operation. — Pranav Hosangadi, Aug 03 '20 at 16:42
Thanks for looking into this! It's the following line: df['error'].iloc[i] =.... etc. However, if I call the function Q_calc and slightly adjust it so that it returns the df as well there is nothing going wrong. Therefore I think that it's the minimize function in which I'm doing someting weird. — Burbs, Aug 03 '20 at 16:51

Linear regression with two datasets which are weighed differently. How to minimize the error?

0 Answers0