5

Assume I have N lists (vectors) and I want to choose x of them 1<x<[N] (x is not predetermined) so I will get the maximum value of func(lists).

For example:

l1 = [3,4,7,-2]
l2 = [0.5,3,6,2.7]
l3 = [0,5,8,3.6]
mat = [l1, l2, l3]

result = maximize(func, mat)

def func(mat):
    # doing some math between lists. For Example:
    sum_list = list(mat[0])
    for li in mat[1:]:
        sum_list = map(operator.add, sum_list, li)
    accum_min_lst = []
    for i, val in enumerate(sum_list):
        x = sum_list[:i + 1]
        accum_min_lst.append(val - max(x))
    return min(accum_min_lst)

Possible results:

[l1], [l2], [l3], [l1,l2], [l1,l3], [l2,l3], [l1,l2,l3]  

If I'll write a naive solution and just run all the combinations it will take forever 2^N.

I'm trying to find a solution using cvxpy or maybe scipy.optimize.minimize but I find it hard to understand the kind of function I need to use for my problem, thought maybe I should try evolutionary algorithm to find an approximate answer, Or maybe I should use portfolio optimization instead.

ali_m
  • 71,714
  • 23
  • 223
  • 298
Ella Cohen
  • 1,375
  • 1
  • 10
  • 14
  • 2
    If nothing is known about `func()`, it is a combinatorial problem, as you already noted. An example for `func()` would helpful to see if more can be done ... – Dietrich Feb 04 '16 at 22:09
  • 1
    Whatever the nature of `func()`, this is a discrete optimization problem. `scipy.optimize.minimize` is for continuous problems, so it won't be much use here. `cvxpy` might be useful provided that your cost function is actually convex. If it isn't convex then you will need to use some sort of global optimization strategy such as simulated annealing. – ali_m Feb 06 '16 at 03:24
  • Hi @ali_m thank you for answering, I'm not sure if my func is convex or concave, I've seen the [advanced function section](http://www.cvxpy.org/en/latest/tutorial/functions/index.html?highlight=sum_entries) in cvxpy but I think my function is more complicated or maybe a combination of a few functions there. – Ella Cohen Feb 07 '16 at 17:31

2 Answers2

3

I chose to use my own version of Evolutionary algorithm Its just more intuitive for me, plus you can play with the population size, generations and the mutation probability:

from random import choice, random

def stack_overflow_example(self):
    def fitness(trial):
        trial_max = self.func(trial, mat)
        if trial_max > self.best_res:
            self.best_res = trial_max
            return trial_max
        else:
            return -sys.maxint

    def mutate(parent):
        mutation = []
        for p in parent:
            if random() < prob:
                mutation.append(choice([0, 1]))
            else:
                mutation.append(p)
        return mutation

    l1 = [3, 4, 7, -2]
    l2 = [0.5, 3, 6, 2.7]
    l3 = [0, 5, 8, 3.6]
    mat = [l1, l2, l3]

    max_num_of_loops = 1000
    prob = 0.075  # mutation probability
    gen_size = 10  # number of children in each generation
    self.bin_parent = [1] * len(mat)  # first parent all ones
    self.best_res = self.func(self.bin_parent, mat)  # starting point for comparison
    for _ in xrange(max_num_of_loops):
        backup_parent = self.bin_parent
        copies = (mutate(self.bin_parent) for _ in xrange(gen_size))
        self.bin_parent = max(copies, key=fitness)
        res = self.func(self.bin_parent, mat)
        if res >= self.best_res:
            self.best_res = res
            print (">> " + str(res))
        else:
            self.bin_parent = backup_parent
    print("Final result: " + str(self.best_res))
    print("Chosen lists:")
    chosen_lists = self.choose_strategies(self.bin_parent, mat)
    for i, li in enumerate(chosen_lists):
        print(">> list[{}] : values: {}".format(i, li))

def func(self, bin_list, mat):
    chosen_mat = self.bin_list_to_mat(bin_list, mat)
    if len(chosen_mat) == 0:
        return -sys.maxint
    # doing some math between lists:
    sum_list = list(chosen_mat[0])
    for li in chosen_mat[1:]:
        sum_list = map(operator.add, sum_list, li)
    accum_min_lst = []
    for i, val in enumerate(sum_list):
        x = sum_list[:i + 1]
        accum_min_lst.append(val - max(x))
    return min(accum_min_lst)

@staticmethod
def bin_list_to_mat(bin_list, mat):
    chosen_lists = []
    for i, stg in enumerate(mat):
        if bin_list[i] == 1:
            chosen_lists.append(stg)
    return chosen_lists

Hope it'll help someone :) cause it took me a while to find this solution.

Ella Cohen
  • 1,375
  • 1
  • 10
  • 14
  • 1
    Note that this is not guaranteed to find the optimal solution. – chthonicdaemon Feb 11 '16 at 07:54
  • I know, but you can add more generations and/or increase the population size, you can add all kind of algorithms that you can stop iterate if the answer didn't changed for x iterations or for more then x seconds – Ella Cohen Feb 13 '16 at 08:12
2

This can be formulated as a MILP and solved using any MILP solver, but I show the solution here using PuLP.

First, let's see what the answer is for the sample problem by doing all the combinations:

import itertools

allfuncs = sum([[func(combs) for combs in itertools.combinations(mat, r)] for r in range(1, 4)], [])

max(allfuncs)

The answer is -3.3

This solution gives the same answer and should scale to larger problems:

import pulp

prob = pulp.LpProblem("MaxFunc", pulp.LpMaximize)

allcols = range(0, len(l1))
allrows = range(0, len(mat))

# These will be our selected rows
rowselected = pulp.LpVariable.dicts('rowselected', allrows, cat=pulp.LpBinary)

# Calulate column sums (equivalent to sum_list in the example)
colsums = pulp.LpVariable.dicts('colsums', allcols, cat=pulp.LpContinuous)
for c in allcols:
    prob += colsums[c] == sum(mat[r][c]*rowselected[r] for r in allrows)

# This is our objective - maximimise this
maxvalue = pulp.LpVariable('maxvalue')
prob += maxvalue

# The tricky part - maximise subject to being less than each of these differences
# I'm relatively confident that all these constraints are equivalent
# to calculating the maximum and subtracting that
for c1 in allcols:
    for c2 in allcols[:c1]:
        prob += maxvalue <= colsums[c1] - colsums[c2]

# choose at least one row
prob += pulp.lpSum(rowselected) >= 1

prob.solve()

print(prob.objective.value())

for c in allrows:
    print(rowselected[c].value())
chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
  • Hi @chthonicdaemon thanks for answering, your answer is actually correct and working, do you think it'll give a solution in normal time (2~3 minutes) on ~100 lists? – Ella Cohen Feb 09 '16 at 16:54
  • And can you maybe explain what does prob+= do? Is there maybe a more comfortable way just to add a full function to prob so it'll be easier to change the functionality inside? – Ella Cohen Feb 09 '16 at 17:14
  • MILP still scales exponentially in the worst case, but a good solver is pretty much the fastest way to solve this kind of problem. There are other solvers that can be selected, even commercial ones like cplex, Wix is currently the fastest. – chthonicdaemon Feb 09 '16 at 19:17
  • Pulp allows one to construct the problem by adding equations to it, that's what the += is doing. I recommend gong through the pulp docs for more info. It seems you're not familiar with linear programming, so I would suggest reading up in that too. Because of the limits of linearity, you can't handle an arbitrary function in this manner, only ones which can be expressed with linear functions. – chthonicdaemon Feb 09 '16 at 19:20
  • Thanks for the explanation, my problem is that my function is more complicated than the one on this example and the solution you gave is really specific. I'll probably read more about linear programming, in the meanwhile I chose to use my own version of [Evolutionary algorithm](http://rosettacode.org/wiki/Evolutionary_algorithm#Python) Its just more intuitive for me. – Ella Cohen Feb 10 '16 at 08:43
  • Please note an important difference between approaches like evolutionary algorithms, GA and so on and MILP: With MILP you are guaranteed to find the optimal solution. With the other methods, you are not. If you are OK with finding just a "good enough" solution, these other methods are fine. Also note in general they are much slower than a proper branch and bound search. – chthonicdaemon Feb 11 '16 at 07:51