I want to implement MLE (Maximum likelihood estimation) with gekko
package in python. Suppose that we have a DataFrame
that contains two columns: ['Loss', 'Target'] and it length is equal to 500.
First we have to import packages that we need:
from gekko import GEKKO
import numpy as np
import pandas as pd
Then we simply create the DataFrame
like this:
My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)
My_DataFrame
It going to be look like this:
Some components of [‘Target’] column should be calculated with a formula that I wrote it right down below in the picture(and the rest of them remains zero. I explained more in continue, please keep reading) so you can see it perfectly. Two main elements of formula are ‘Kasi’ and ‘Betaa’. I want to find best value for them that maximize sum of My_DataFrame[‘Target’]
. So you got the idea and what is going to happen!
Now let me show you how I wrote the code for this purpose. First I define my objective function:
def obj_function(Array):
"""
[Purpose]:
+ it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!
[Parameters]:
+ This function gets Array that contains 'Kasi' and 'Betaa'.
Array[0] represents 'Kasi' and Array[1] represents 'Betaa'
[returns]:
+ returns a pandas.series.
actually it returns new components of My_DataFrame["Target"]
"""
# in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
# in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item.
for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*( 1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))
return My_DataFrame["Target"]
if you got confused what's happening in for loop
in obj_function
function, check picture below, it contains a brief example! and if not, skip this part :
Then just we need to go through optimization. I use gekko
package for this purpose. Note that I want to find best values of ‘Kasi’ and ‘Betaa’ so I have two main variables and I don’t have any kind of constraints!
So let’s get started:
# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()
# now i want to specify my variables ('Kasi' and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd , value = [0.7 , 20])
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'
qw.Maximize(np.sum(obj_function(x)))
And then when I want to solve the optimization with qw.solve()
:
qw.solve()
But i got this error:
Exception: This steady-state IMODE only allows scalar values.
How can I fix this problem? (Complete script gathered in next section for the purpose of convenience)
from gekko import GEKKO
import numpy as np
import pandas as pd
My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)
def obj_function(Array):
"""
[Purpose]:
+ it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!
[Parameters]:
+ This function gets Array that contains 'Kasi' and 'Betaa'.
Array[0] represents 'Kasi' and Array[1] represents 'Betaa'
[returns]:
+ returns a pandas.series.
actually it returns new components of My_DataFrame["Target"]
"""
# in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
# in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item.
for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*( 1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))
return My_DataFrame["Target"]
# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()
# now i want to specify my variables ('Kasi' and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'
qw.Maximize(qw.sum(obj_function(x)))
proposed potential script is here:
from gekko import GEKKO
import numpy as np
import pandas as pd
My_DataFrame = pd.read_excel("[<FILE_PATH_IN_YOUR_MACHINE>]\\Losses.xlsx")
# i'll put link of "Losses.xlsx" file in the end of my explaination
# so you can download it from my google drive.
loss = My_DataFrame["Loss"]
def obj_function(x):
k,b = x
target = []
for iloss in loss:
if iloss>160:
t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
target.append(t)
return target
qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)
# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
x[i].value = xi
# bounds
k,b = x
k.lower=0.1; k.upper=0.8
b.lower=10; b.upper=500
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])
python output:
objective function = -1155.4861315885942
b = 500.0
k = 0.1
note that in python output b
is representing "Betaa" and k
is representing "Kasi".
output seems abit strange, so i decide to test it! for this purpose I used Microsoft Excel Solver!
(i put the link of excel file at the end of my explaination so you can check it out yourself if
you want.) as you can see in picture bellow, optimization by excel has been done and optimal solution
has been found successfully (see picture bellow for optimization result).
excel output:
objective function = -108.21
Betaa = 32.53161
Kasi = 0.436246
as you can see there is huge difference between python output
and excel output
and seems that excel is performing pretty well! so i guess problem still stands and proposed python script is not performing well...
Implementation_in_Excel.xls
file of Optimization by Microsoft excel application is available here.(also you can see the optimization options in Data tab --> Analysis --> Slover.)
data that used for optimization in excel and python are same and it's available here (it's pretty simple and contains 501 rows and 1 column).
*if you can't download the files, let me know then I'll update them.