Optimized fitting coefficients for better fitting

Question

I'm running a nonlinear least squares using the minpack.lm package.

However, for each group in the data I would like optimize (minimize) fitting parameters like similar to Python's minimize function.

The minimize() function is a wrapper around Minimizer for running an optimization problem. It takes an objective function (the function that calculates the array to be minimized), a Parameters object, and several optional arguments.

The reason why I need this is that I want to optimize fitting function based on the obtained fitting parameters to find global fitting parameters that can fit both of the groups in the data.

Here is my current approach for fitting in groups,

df <- data.frame(y=c(replicate(2,c(rnorm(10,0.18,0.01), rnorm(10,0.17,0.01))), 
                                c(replicate(2,c(rnorm(10,0.27,0.01), rnorm(10,0.26,0.01))))),
                         DVD=c(replicate(4,c(rnorm(10,60,2),rnorm(10,80,2)))),
                         gr = rep(seq(1,2),each=40),logic=rep(c(1,0),each=40))

the fitting equation of these groups is

fitt <- function(data) {
  fit <- nlsLM(y~pi*label2*(DVD/2+U1)^2,
               data=data,start=c(label2=1,U1=4),trace=T,control = nls.lm.control(maxiter=130))
}

library(minpack.lm)
library(plyr)  # will help to fit in groups

fit <- dlply(df, c('gr'), .fun = fitt)  #,"Die" only grouped by Waferr

> fit
$`1`
Nonlinear regression model
  model: y ~ pi * label2 * (DVD/2 + U1)^2
   data: data
   label2        U1 
2.005e-05 1.630e+03 
$`2`
label2      U1 
 2.654 -35.104

I need to know are there any function that optimizes the sum-of-squares to get best fitting for both of the groups. We may say that you already have the best fitting parameters as the residual sum-of-squares but I know that minimizer can do this but I haven't find any similar example we can do this in R.

ps. I made it up the numbers and fitting lines.

I think `?optim` is equivalent to Python's `minimize`. The author of optim recommends the `optimx` package as making good improvements. — Gregor Thomas, Jun 20 '17 at 20:43
@Gregor I see. I am sorry for make you guys annoying. I am just stucked and need really help on this issue! — Alexander, Jun 20 '17 at 20:43
@Gregor I have never heard about `optimx`. I made a quick look on stackoverflow but seems that there is no optimx method for grouped data. — Alexander, Jun 20 '17 at 20:54
Both optim and optimx minimize an arbitrary function. I think you need to clarify what you mean by "optimizes both of these fitting parameters together" - are there certain parameters you want to be the same between groups? How do you want your results to be different than fitting the groups separately? I think if you specify your model correctly you can probably do what you want with `nls`... — Gregor Thomas, Jun 20 '17 at 20:57
@Gregor I wish nls could do that. So I wouldn't desperately look for another solution. The thing is that when I do this fitting with python I can get decent fitting lines for both of the group with single fitting coefficients. I guess I made wrong statement with 'optimizes both of these fitting parameters together'' I edited the OP! — Alexander, Jun 20 '17 at 21:09
Maybe you want an intercept that varies by group? I'm not familiar with `nlsLM`, but my guess is if you make `gr` a `factor` and add it to your equation, `nlsLM(y~pi*label2*(DVD/2+U1)^2 + int * gr, data=data,start=c(label2=1,U1=4, int = 0.1), ...` maybe that will do what I think you want. — Gregor Thomas, Jun 20 '17 at 21:21

mikuszefski · Accepted Answer · 2017-06-28T14:29:46.723

Not sure about r, but having least squares with shared parameters is usually simple to implement.

A simple python example looks like:

import matplotlib
matplotlib.use('Qt4Agg')
from matplotlib import pyplot as plt

from random import random
from scipy import optimize
import numpy as np

#just for my normal distributed errord
def boxmuller(x0,sigma):
    u1=random()
    u2=random()
    ll=np.sqrt(-2*np.log(u1))
    z0=ll*np.cos(2*np.pi*u2)
    z1=ll*np.cos(2*np.pi*u2)
    return sigma*z0+x0, sigma*z1+x0

#some non-linear function
def f0(x,a,b,c,s=0.05):
    return a*np.sqrt(x**2+b**2)-np.log(c**2+x)+boxmuller(0,s)[0]

# residual function for least squares takes two data sets. 
# not necessarily same length
# two of three parameters are common
def residuals(parameters,l1,l2,dataPoints):
    a,b,c1,c2 = parameters
    set1=dataPoints[:l1]
    set2=dataPoints[-l2:]
    distance1 = [(a*np.sqrt(x**2+b**2)-np.log(c1**2+x))-y for x,y in set1]
    distance2 = [(a*np.sqrt(x**2+b**2)-np.log(c2**2+x))-y for x,y in set2]
    res = distance1+distance2
    return res

xList0=np.linspace(0,8,50)
#some xy data
xList1=np.linspace(0,7,25)
data1=np.array([f0(x,1.2,2.3,.33) for x in xList1])
#more xy data using different third parameter
xList2=np.linspace(0.1,7.5,28)
data2=np.array([f0(x,1.2,2.3,.77) for x in xList2])
alldata=np.array(zip(xList1,data1)+zip(xList2,data2))

# rough estimates
estimate = [1, 1, 1, .1]
#fitting; providing second length is actually redundant
bestFitValues, ier= optimize.leastsq(residuals, estimate,args=(len(data1),len(data2),alldata))
print bestFitValues


fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(xList1, data1)
ax.scatter(xList2, data2)
ax.plot(xList0,[f0(x,bestFitValues[0],bestFitValues[1],bestFitValues[2] ,s=0) for x in xList0])
ax.plot(xList0,[f0(x,bestFitValues[0],bestFitValues[1],bestFitValues[3] ,s=0) for x in xList0])


plt.show()

#output
>> [ 1.19841984  2.31591587  0.34936418  0.7998094 ]

If required you can even make your minimization yourself. If your parameter space is sort of well behaved, i.e. approximately parabolic minimum, a simple Nelder Mead method is quite OK.

Optimized fitting coefficients for better fitting

1 Answers1

Linked