optimal values of a,b,c,d,e based on data passed to a function with known ground truth values

Question

I have a function which takes several parameters - a,b,c,d,e and the returns the computed value of z.

I also have the ground truth value of z and I would like to compute the optimal parameters of a,b,c,d,e which would minimize the error between the ground truth value z and the approximated value of z by the function.

I have lots of data for the computed value of z and the ground truth z given different input values of a,b,c,d,e.

Any suggestion how to find the optimal parameters of a,b,c,d and e? I was thinking something like gradient descent or optimization but I am not sure how to proceed.

z1 = function (a1 + b1 + c1 + d1 + e1) and error1 = z1 - z1' where z'1 is the ground truth

z2 = function (a2 + b2 + c2 + d2 + e2) and error2 = z2 - z2' where z2' is the ground truth

... ... .... ... .... ... ... .... ... .... ... .... ..... .... .... .......................................................................

zn = function (an + bn + cn + dn + en) and errorn = zn - zn' where zn' is the ground truth

Thanks a lot for your help in advance.

Ash

score 1 · Answer 1 · edited Jun 20 '20 at 09:12

What you really want to do is to estimate the value of y, with the values of x1,x2,x3,x4 and x5 given.

You can use something simple as Linear Regression for predicting the value of y with other parameters given.

Now, as you probably guessed, the equation to solve for y will be something like this:

y = b0 + b1 * x1 + b2 * x2 + ... where your goal will be to predict a set of coefficients which are able to predict your y correctly, given x1..x5. Such optimization can be done by Stochastic Gradient Descent

Here are some of the simple python code to achieve such result:

# Make a prediction with coefficients
def predict(row, coefficients):
    yhat = coefficients[0]
    for i in range(len(row)-1):
        yhat += coefficients[i + 1] * row[i]
        return yhat

# Estimate linear regression coefficients using stochastic gradient descent
def coefficients_sgd(train, l_rate, n_epoch):
    coef = [0.0 for i in range(len(train[0]))]
    for epoch in range(n_epoch):
        sum_error = 0
        for row in train:
            yhat = predict(row, coef)
            error = yhat - row[-1]
            sum_error += error**2
            coef[0] = coef[0] - l_rate * error
            for i in range(len(row)-1):
                coef[i + 1] = coef[i + 1] - l_rate * error * row[i]
        print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))
    return coef

For testing, let's feed very simple data into the function and see the results.

# Calculate coefficients
dataset = [[1, 1], [2, 3], [4, 3], [3, 2], [5, 5]]
l_rate = 0.001
n_epoch = 50
coef = coefficients_sgd(dataset, l_rate, n_epoch)
print(coef)

This is the output:

epoch=45, lrate=0.001, error=2.650

epoch=46, lrate=0.001, error=2.627

epoch=47, lrate=0.001, error=2.607

epoch=48, lrate=0.001, error=2.589

epoch=49, lrate=0.001, error=2.573

[0.22998234937311363, 0.8017220304137576]

I am looking for the optimal values of x1,x2,x3,x4 and x5 this data - not the coefficients - I have edited the title to make things clearer — Ash, Oct 24 '18 at 11:40
What if x1..x5 are all random? how can you find optimal x1..x5 to predict z? I am sure you have messed up some assumptions. — Saurabh, Oct 24 '18 at 11:49
I am supplying the x1 to x5 to the function to get z- they are not random. — Ash, Oct 24 '18 at 14:20

optimal values of a,b,c,d,e based on data passed to a function with known ground truth values

1 Answers1