1

I have a dataframe with 3 columns: Y, X1, X2. I want to find the parameter estimates b1 and b2 by minimizing the sum of squares according to:

Objective function: minimize the sum of squares (Y - (b1*X1 + b2*X2))^2
Constraints: 0 < b1 < 2, 0 < b2 < 1
Initial guesses: b1=b2=0.5
Technique: Newton-Raphson

I know that I can use

scipy.optimize.minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None, bounds=None, constraints=(), tol=None, callback=None, options=None)

but I can't see how to pass the columns from the dataframe in as all the examples I found from searching don't use columns from a dataframe.

I would be very grateful for any help.

Damask_Rose
  • 25
  • 1
  • 4
  • scipy isn't pandas-aware. therefore, you'd extract the colums, e.g., `scipy.optimize.minimize(fun, mydf['numeric_column'], args=())` – Paul H Feb 26 '19 at 17:06
  • Thanks very much for this, but where you've got "mydf['numeric_column']" corresponds to where I should input the initial guess(es), i.e. b1=b2=0.5 which are not in the dataframe. – Damask_Rose Feb 26 '19 at 17:18

1 Answers1

3

This could be some start-point for you. As long as the return of your objective function is scalar, it should be no problem. Pass the dataframe via the args-keywords in a tuple. See the Documentation of the minimize function to check which method you want to use.

EDIT: I changed the code based on the description in your comment.

import numpy as np
import scipy.optimize as opt
import pandas as pd

def main(df):
    x0 = [0.5,0.5]
    res = opt.minimize(fun=obj, x0=np.array(x0), args=(df), method="BFGS", bounds=[(0,2),(0,1)])
    return res

def obj(x, df):
    #maybe use a global variable to get the dataframe or via args
    sumSquares = np.mean((df["Y"] - (x[0]*df["X1"] + x[1]*df["X2"]))**2)
    return sumSquares

df = pd.DataFrame({"Y":np.random.rand(100),
                   "X1":np.random.rand(100),
                   "X2":np.random.rand(100)})
print(main(df))
f.wue
  • 837
  • 8
  • 15
  • Thanks very much for this, but it corresponds to what I've found in my searches to date and I can't relate the obj(x) function to my dataframe described above. The dataframe just has the 3 columns described above with each one containing numerical values. – Damask_Rose Feb 26 '19 at 17:20
  • Thanks very very much. That makes sense and works perfect on my dataframe. Much appreciated. – Damask_Rose Feb 26 '19 at 20:40
  • Happy to help! If the answer solves your problem, feel free to accept it:) – f.wue Feb 26 '19 at 21:26
  • Thanks @f.wue for sharing this example. I tried it with my data set but I am getting this message in the results - : 'Desired error not necessarily achieved due to precision loss.' Any idea how to handle this? – Anurag Sharma Jul 03 '21 at 11:34
  • No, but maybe check this question out: https://stackoverflow.com/questions/24767191/scipy-is-not-optimizing-and-returns-desired-error-not-necessarily-achieved-due – f.wue Jul 26 '21 at 08:22