0

I'm used to using Excel for this kind of problem but I'm trying my hand at Python for now.

Basically I have two sets of arrays, one constant, and the other's values come from a user-defined function.

This is the function, simple enough.

import scipy.stats as sp

def calculate_probability(spread, std_dev):
    return sp.norm.sf(0.5, spread, std_dev)

I have two arrays of data, one with entries that run through the calculate_probability function (these are the spreads), and the other a set of constants called expected_probabilities.

spreads = [10.5, 9.5, 10, 8.5]

expected_probabilities = [0.8091, 0.7785, 0.7708, 0.7692]

The below function is what I am seeking to optimise.

import numpy as np
def calculate_mse(std_dev):
    spread_inputs = np.array(spreads)
    model_probabilities = calculate_probability(spread_inputs,std_dev)
    subtracted_vector = np.subtract(model_probabilities,expected_probabilities)
    vector_powered = np.power(subtracted_vector,2)
    mse_sum = np.sum(vector_powered)
    return mse_sum/len(spreads)

I would like to find a value of std_dev such that function calculate_mse returns as close to zero as possible. This is very easy in Excel using solver but I am not sure how to do it in Python. What is the best way?

EDIT: I've changed my calculate_mse function so that it only takes a standard deviation as a parameter to be optimised. I've tried to return Andrew's answer in an API format using flask but I've run into some issues:

class Minimize(Resource):

    std_dev_guess = 12.0  # might have a better guess than zeros
    result = minimize(calculate_mse, std_dev_guess)

    def get(self):
        return {'data': result},200

api.add_resource(Minimize,'/minimize')

This is the error:

NameError: name 'result' is not defined

I guess something is wrong with the input?

clattenburg cake
  • 1,096
  • 3
  • 19
  • 40
  • 1
    You need to change {'data': result},200 to {'data': self.result},200. Code wise, I have a couple suggestions, see edit. – Andrew Holmgren Jul 10 '20 at 15:41
  • Thanks so much. I've managed to get a successful optimization with that! I'll look over your code so I can clear up the boilerplate, but thanks so much again. – clattenburg cake Jul 10 '20 at 16:33

1 Answers1

1

I'd suggest using scipy's optimization library. From there, you have a couple options, the easiest from your current setup would be to just use the minimize method. Minimize itself has a massive amount of options, from simplex methods (default) to BFGS and COBYLA. https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.minimize.html

from scipy.optimize import minimize

n_params = 4  # based of your code so far
spreads_guess = np.zeros(n_params)  # might have a better guess than zeros
result = minimize(calculate_mse, spreads_guess)

Give it a shot and if you have extra questions I can edit the answer and elaborate as needed.

Here's just a couple suggestions to clean up your code.

class Minimize(Resource):

    def _calculate_probability(self, spread, std_dev):
        return sp.norm.sf(0.5, spread, scale=std_dev)
  
    def _calculate_mse(self, std_dev):
        spread_inputs = np.array(self.spreads)
        model_probabilities = self._calculate_probability(spread_inputs, std_dev)
        mse = np.sum((model_probabilities - self.expected_probabilities)**2) / len(spread_inputs)
        print(mse)
        return mse

    def __init__(self, expected_probabilities, spreads, std_dev_guess):
        self.std_dev_guess = std_dev_guess
        self.spreads = spreads
        self.expected_probabilities = expected_probabilities
        self.result = None

    def solve(self):
        self.result = minimize(self._calculate_mse, self.std_dev_guess, method='BFGS')

    def get(self):
        return {'data': self.result}, 200

# run something like
spreads = [10.5, 9.5, 10, 8.5]
expected_probabilities = [0.8091, 0.7785, 0.7708, 0.7692]
minimizer = Minimize(expected_probabilities, spreads, 10.)
print(minimizer.get())  # returns none since it hasn't been run yet, up to you how to handle this
minimizer.solve()
print(minimizer.get())

Andrew Holmgren
  • 1,225
  • 1
  • 11
  • 18
  • Hey @AndrewHolmgren, thanks for pointing me toward `scipy.optimize` - I've edited my mse function and the input to just be the value I'm trying to optimise but it still doesn't work. – clattenburg cake Jul 10 '20 at 12:46
  • One question @AndrewHolmgren, I've tried to access the result via an API- so I've added `api.add_resource(Minimize, '/minimize')` at the end of your code. I get a `TypeError: __init__() missing 3 required positional arguments: 'expected_probabilities', 'spreads', and 'std_dev_guess` How do I expose the results to an API endpoint? – clattenburg cake Jul 14 '20 at 05:23
  • 1
    @clattenburgcake You should be able to pass in parameters like this https://stackoverflow.com/a/33740849/8056248 or this https://stackoverflow.com/a/39418645/8056248 – Andrew Holmgren Jul 14 '20 at 14:42