MLE application with gekko in python

Question

I want to implement MLE (Maximum likelihood estimation) with gekko package in python. Suppose that we have a DataFrame that contains two columns: ['Loss', 'Target'] and it length is equal to 500.
First we have to import packages that we need:

from gekko import GEKKO
import numpy as np
import pandas as pd

Then we simply create the DataFrame like this:

My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)
My_DataFrame

It going to be look like this:

Some components of [‘Target’] column should be calculated with a formula that I wrote it right down below in the picture(and the rest of them remains zero. I explained more in continue, please keep reading) so you can see it perfectly. Two main elements of formula are ‘Kasi’ and ‘Betaa’. I want to find best value for them that maximize sum of My_DataFrame[‘Target’]. So you got the idea and what is going to happen!

Now let me show you how I wrote the code for this purpose. First I define my objective function:

def obj_function(Array):
    """
    [Purpose]:
        + it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!
    
    [Parameters]:
        + This function gets Array that contains 'Kasi' and 'Betaa'.
        Array[0] represents 'Kasi' and Array[1] represents 'Betaa'

    [returns]:
        + returns a pandas.series.
        actually it returns new components of My_DataFrame["Target"]
    """
    # in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
    # in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item. 
    for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
        My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*(  1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))

    return My_DataFrame["Target"]

if you got confused what's happening in for loop in obj_function function, check picture below, it contains a brief example! and if not, skip this part :

Then just we need to go through optimization. I use gekko package for this purpose. Note that I want to find best values of ‘Kasi’ and ‘Betaa’ so I have two main variables and I don’t have any kind of constraints! So let’s get started:

# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()

# now i want to specify my variables ('Kasi'  and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd , value = [0.7 , 20])
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'

qw.Maximize(np.sum(obj_function(x)))

And then when I want to solve the optimization with qw.solve():

qw.solve()

But i got this error:

Exception: This steady-state IMODE only allows scalar values.

How can I fix this problem? (Complete script gathered in next section for the purpose of convenience)

from gekko import GEKKO
import numpy as np
import pandas as pd


My_DataFrame = pd.DataFrame({"Loss":np.linspace(-555.795 , 477.841 , 500) , "Target":0.0})
My_DataFrame = My_DataFrame.sort_values(by=["Loss"] , ascending=False).reset_index(drop=True)

def obj_function(Array):
    """
    [Purpose]:
        + it will calculate each component of My_DataFrame["Target"] column! then i can maximize sum(My_DataFrame["Target"]) and find best 'Kasi' and 'Betaa' for it!
    
    [Parameters]:
        + This function gets Array that contains 'Kasi' and 'Betaa'.
        Array[0] represents 'Kasi' and Array[1] represents 'Betaa'

    [returns]:
        + returns a pandas.series.
        actually it returns new components of My_DataFrame["Target"]
    """
    # in following code if you don't know what is `qw`, just look at the next code cell right after this cell (I mean next section).
    # in following code np.where(My_DataFrame["Loss"] == item)[0][0] is telling me the row's index of item. 
    for item in My_DataFrame[My_DataFrame["Loss"]>160]['Loss']:
        My_DataFrame.iloc[np.where(My_DataFrame["Loss"] == item)[0][0] , 1] = qw.log10((1/Array[1])*(  1 + (Array[0]*(item-160)/Array[1])**( (-1/Array[0]) - 1 )))

    return My_DataFrame["Target"]



# i have 2 variables : 'Kasi' and 'Betaa', so I put nd=2
nd = 2
qw = GEKKO()

# now i want to specify my variables ('Kasi'  and 'Betaa') with initial values --> Kasi = 0.7 and Betaa = 20.0
x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
   x[i].value = xi
# So i guess now x[0] represents 'Kasi' and x[1] represents 'Betaa'

qw.Maximize(qw.sum(obj_function(x)))

proposed potential script is here:

from gekko import GEKKO
import numpy as np
import pandas as pd


My_DataFrame = pd.read_excel("[<FILE_PATH_IN_YOUR_MACHINE>]\\Losses.xlsx")
# i'll put link of "Losses.xlsx" file in the end of my explaination
# so you can download it from my google drive.


loss = My_DataFrame["Loss"]
def obj_function(x):
    k,b = x
    target = []

    for iloss in loss:
        if iloss>160:
            t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
            target.append(t)
    return target


qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)

# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
   x[i].value = xi
   
# bounds
k,b = x
k.lower=0.1; k.upper=0.8
b.lower=10;  b.upper=500
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])

python output:

objective function = -1155.4861315885942
b = 500.0
k = 0.1

note that in python output b is representing "Betaa" and k is representing "Kasi".
output seems abit strange, so i decide to test it! for this purpose I used Microsoft Excel Solver!
(i put the link of excel file at the end of my explaination so you can check it out yourself if you want.) as you can see in picture bellow, optimization by excel has been done and optimal solution has been found successfully (see picture bellow for optimization result).

excel output:

objective function = -108.21
Betaa = 32.53161
Kasi = 0.436246

as you can see there is huge difference between python output and excel output and seems that excel is performing pretty well! so i guess problem still stands and proposed python script is not performing well...
Implementation_in_Excel.xls file of Optimization by Microsoft excel application is available here.(also you can see the optimization options in Data tab --> Analysis --> Slover.)
data that used for optimization in excel and python are same and it's available here (it's pretty simple and contains 501 rows and 1 column).
*if you can't download the files, let me know then I'll update them.

score 2 · Answer 1 · answered Aug 06 '21 at 08:47

2

qw.Maximize() only sets the objective of the optimization, you still need to call solve() on your model.

answered Aug 06 '21 at 08:47

alexis

48,685
16
101
161

Yes Thanks. but i got an error so i'll update the issue. thank you! – Shayan Aug 06 '21 at 08:52
I think you need to construct your objective function differently... – alexis Aug 06 '21 at 09:03
is there any chance to someone here help me with that? – Shayan Aug 06 '21 at 09:12

John Hedengren · Accepted Answer · 2021-08-23T17:18:09.040

The initialization is applying the values of [0.7, 20] to each parameter. A scalar should be used to initialize value instead such as:

x = qw.Array(qw.Var , nd)
for i,xi in enumerate([0.7, 20]):
   x[i].value = xi

Another issue is that gekko needs to use special functions to perform automatic differentiation for the solvers. For the objective function, switch to the gekko version of summation as:

qw.Maximize(qw.sum(obj_function(x)))

If loss is computed by changing the values of x then the objective function has logical expressions that need special treatment for solution with gradient-based solvers. Try using the if3() function for a conditional statement or else slack variables (preferred). The objective function is evaluated once to build a symbolic expressions that are then compiled to byte-code and solved with one of the solvers. The symbolic expressions are found in m.path in the gk0_model.apm file.

Response to Edit

Thanks for posting an edit with the complete code. Here is a potential solution:

from gekko import GEKKO
import numpy as np
import pandas as pd

loss = np.linspace(-555.795 , 477.841 , 500)
def obj_function(x):
    k,b = x
    target = []

    for iloss in loss:
        if iloss>160:
            t = qw.log((1/b)*(1+(k*(iloss-160)/b)**((-1/k)-1)))
            target.append(t)
    return target
qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)
# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
   x[i].value = xi
# bounds
k,b = x
k.lower=0.6; k.upper=0.8
b.lower=10;  b.upper=30
qw.Maximize(qw.sum(obj_function(x)))
qw.options.SOLVER = 1
qw.solve()
print('k = ',k.value[0])
print('b = ',b.value[0])

The solver reaches bounds at the solution. The bounds may need to be widened so that arbitrary limits are not the solution.

Update

Here is a final solution. That objective function in code had a problem so it should be fixed Here is the correct script:

from gekko import GEKKO
import numpy as np
import pandas as pd

My_DataFrame = pd.read_excel("<FILE_PATH_IN_YOUR_MACHINE>\\Losses.xlsx")
loss = My_DataFrame["Loss"]

def obj_function(x):
    k,b = x
    q = ((-1/k)-1)
    target = []

    for iloss in loss:
        if iloss>160:
            t = qw.log(1/b) + q* ( qw.log(b+k*(iloss-160)) - qw.log(b))
            target.append(t)
    return target

qw = GEKKO(remote=False)
nd = 2
x = qw.Array(qw.Var,nd)

# initial values --> Kasi = 0.7 and Betaa = 20.0
for i,xi in enumerate([0.7, 20]):
   x[i].value = xi

qw.Maximize(qw.sum(obj_function(x)))
qw.solve()
print('Kasi = ',x[0].value)
print('Betaa = ',x[1].value)

Output:

 The final value of the objective function is  108.20609317143486
 
 ---------------------------------------------------
 Solver         :  IPOPT (v3.12)
 Solution time  :  0.031200000000000006 sec
 Objective      :  108.20609317143486
 Successful solution
 ---------------------------------------------------
 

Kasi =  [0.436245842]
Betaa =  [32.531632983]

Results are close to the optimization result from Microsoft Excel.

Thanks Dear Professor. in maximization part, i think you mean `qw.Maximize(qw.sum(obj_function(x)))`. you used `m.sum(...)` and i don't know what is `m` but i guess you mean `qw` in my case. i used your initialization step but i got this error: **"TypeError: x must be a python list of GEKKO parameters, variables, or expressions"** and it refers to `qw.Maximize(qw.sum(obj_function(x)))` part of code! — Shayan, Aug 18 '21 at 15:16
You are correct - the gekko model is called `qw`. Could you post a complete script so that we can verify the fix? The objective function needs to return a Gekko expression, not just values. — John Hedengren, Aug 18 '21 at 15:29
Sure! I updated the question and integrated script added to end of question. I'll be appreciated if you could check it out. — Shayan, Aug 18 '21 at 15:35
Thanks Dear professor for your kind attitude to help me! I tried to test this optimization to verify results! so I tried this optimization in **MICROSOFT EXCEL** *solver* and result was significantly different! I updated the question and explained route of verifying! I hope you read my new explanations and I would be very appreciated if you could help me with it. Thanks. — Shayan, Aug 22 '21 at 08:24
Now it's Done dear professor. i edited your answer. please check it out. again, Thank you very much. — Shayan, Aug 22 '21 at 21:36

pu239 · Answer 3 · 2021-08-06T08:34:10.467

1

If I can see correctly, My_DataFrame has been defined in the global scope.
The problem is that the obj_funtion tries to access it (successful) and then, modify it's value (fails) This is because you can't modify global variables from a local scope by default.

Fix:

At the beginning of the obj_function, add a line:

def obj_function(Array):
    # comments
    global My_DataFrame
    for item .... # remains same

This should fix your problem.

Additional Note:

If you just wanted to access My_DataFrame, it would work without any errors and you don't need to add the global keyword

Also, just wanted to appreciate the effort you put into this. There's a proper explanation of what you want to do, relevant background information, an excellent diagram (Whiteboard is pretty great too), and even a minimal working example. This should be how all SO questions are, it would make everyone's lives easier

edited Aug 06 '21 at 08:34

answered Aug 06 '21 at 08:29

pu239

707
7
17

I did your suggestion but Unfortunately problem not solved yet and it just returns initial values :( – Shayan Aug 06 '21 at 08:33
Thanks! Thanks for trying to help me and your kind attitude ♥ – Shayan Aug 06 '21 at 08:43
@Shayan maybe [this](https://stackoverflow.com/questions/59103401/python-gekko-minlp-optimization-of-energy-system-how-to-build-intermediates-tha) can help? – pu239 Aug 06 '21 at 16:54
yeah i saw that few hours ago and tried `x = qw.Array(qw.Var , nd , value = 1/nd)` considering `nd=2` and then i got this exception: `Exception: @error: Solution Not Found`. anyway i appreciated for your attempt to help me. really it's so encouraging for me. – Shayan Aug 06 '21 at 17:50
2

I think that is a separate error covered by many posts, like [this one](https://stackoverflow.com/questions/56942615/how-to-fix-solution-not-found-error-in-python-gekko-optimal-control-code). Google the error to find more results – pu239 Aug 06 '21 at 18:03
1

Thanks, i didn't see that recently! surly i try to find a way to solve the issue ! and thanks. – Shayan Aug 06 '21 at 18:08

MLE application with gekko in python

3 Answers3

Fix:

Additional Note: