I can't even phrase the question, I need 3 closely equal number from a huge set of numbers

Question

I have an optimisation problem with:

5 variables a,b,c,d,e;
4 constraints;
5 objectives;
a list of "increment" instructions to choose from.

The constraints:

a >= x
b >= y
e > c
e > d

x and y are integer parameters.

The objectives:

maximize (c + d) * 2 + e
minimize a
minimize b
minimize e - c
minimize e - d

The instructions:

I have about 80-90 lines; the first line is the initialization, then each line consists on up to 4 sets of "increment" instructions. Solving the problem consists in choosing one set of instructions per line. Here are the first lines as an example:

{a = 0; b = 0; c = 0; d = 0; e = 0}

{b += 360} OR {b += 160; c += 160} OR {b += 160; d += 160} OR {b += 160; e += 160}
{a += 360} OR {a += 160; c += 160} OR {a += 160; d += 160} OR {a += 160; e += 160}
{c += 1697; d += 1697} OR {c += 1697; d += 1019; e += 678} OR {c += 1019; d += 1697; e += 678}

An example:

Say x = 1200, y = 170, and we have the following six lines of instructions:

{b += 360} OR {b += 160; c += 160} OR {b += 160; d += 160} OR {b += 160; e += 160}
{a += 360} OR {a += 160; c += 160} OR {a += 160; d += 160} OR {a += 160; e += 160}
{c += 1697; e += 1697} OR {c += 1697; e += 1019; d += 678} OR {c += 1019; e += 1697; d += 678}
{b += 360} OR {b += 160; c += 160} OR {b += 160; d += 160} OR {b += 160; e += 160}
{a += 360} OR {a += 160; c += 160} OR {a += 160; d += 160} OR {a += 160; e += 160}
{a += 1149; d += 939} OR {a += 1149; d += 939; e += 678} OR {a += 939; d += 678; e += 1149}

One possible solution in this example is to pick the first set of instructions from each line:

{b += 360},
{a += 360},
{c += 1697; e += 1697},
{b += 360},
{a += 360},
{a += 1149; d += 939}

Then we get these values:

a = 1869, b = 720, c = 1697, d = 939, e = 1697

With objectives:

(c + d) * 2 + e = 6969 (to be maximized)
a               = 1869 (to be minimized but >= 1200)
b               = 720  (to be minimised but >= 170)
e - c           = 0    (to be minimized but >= 0)
e - d           = 758  (to be minimized but >= 0)

But a better solution would be to pick these 6 sets of instructions:

{b += 160; d += 160},
{a += 160; d += 160},
{c += 1697; e += 1019; d += 678},
{b += 160; d += 160},
{a += 160; d += 160},
{a += 939; d += 678; e += 1149}

a = 1259, b = 320, c = 1697, d = 1996, e = 2168

(c + d) * 2 + e = 9554 (to be maximized)
a               = 1259 (to be minimized but >= 1200)
b               = 320  (to be minimised but >= 170)
e - c           = 471  (to be minimized but >= 0)
e - d           = 172  (to be minimized but >= 0)

I already tought about bruteforcing it, but with 80-90 lines of instructions it has about 876488338465357824 possible combinations, so that's not a valid way to do this.

I don't need this to be exactly perfect, a good approximation might suffice.

Any recommendation of tools to solve this problem is helpful, and any keyword to help me search for an appropriate algorithm and for similar problems is welcome.

Such problems can be tackled with [MiniZinc](https://www.minizinc.org/), a free tool for constraint programming. — Axel Kemper, Nov 16 '21 at 15:19
Are you try to *generate* {a, b, c, d, e} or are you trying to *select* the best fit from a pool/table of existing values? — RBarryYoung, Nov 16 '21 at 15:20
I think there is an ambiguity in the question. Are you trying to find one quintuplet (a,b,c,d,e) for every line, or are you trying to find one quintuplet (a,b,c,d,e) that fits all lines simultaneously? — Stef, Nov 16 '21 at 15:26
@tripleee No, this is a programming isssue. This question asks for an algorithm for a non-linear, nnon-continuous filter. Math would do little to help here. — Michaël Roy, Nov 16 '21 at 15:28
@RBarryYoung I want to select the best fit from a pool/table of existing values. — Ceritoxi, Nov 16 '21 at 15:29
@Ceritoxi Is is nt clear from the question whether these objects are completely defined, or may contain undefined values. — Michaël Roy, Nov 16 '21 at 15:30
Selecting from a pool definitely makes this a programming algorithm question, so I will not be voting to close. — RBarryYoung, Nov 16 '21 at 15:31
@Michaël Roy I don't understand what you mean. There may be objects where one of (or even more) the values are not defined (as shown in the example) — Ceritoxi, Nov 16 '21 at 15:35
@Ceritoxi Then a neural net may be the most sensible way to go. — Michaël Roy, Nov 16 '21 at 15:47
@Stef I want to find one quintuplet (a,b,c,d,e) that fits all lines simultaneously — Ceritoxi, Nov 16 '21 at 15:47
@Ceritoxi You are utlimately looking for correlations in a very noisy data set, aren't you? — Michaël Roy, Nov 16 '21 at 16:05
I have voted to reopen, however, be aware that the constraints, conditions, context and the prioritization of these is still very ambiguous and unclear, and I do have doubts that the question is answerable in its current form. — RBarryYoung, Nov 16 '21 at 22:28
I think most ambiguities, and difficulty we have understanding your question, come from the fact that your problem statement takes a lot of liberty with regard to the usual meaning of `=` and of `OR`. For instance, from `a = 360, a = 360 and a = 1149`, you deduce `a = 1869`. That's not how equality usually work, and it gets everyone confused. — Stef, Nov 17 '21 at 09:58
I strongly suggest choosing more appropriate symbols and vocabulary, so as to avoid misleading people who read your question. For instance, you could replace all those `=` with `+=`; and call `{b += 160; c += 160}` an *instruction* rather than an *object*. — Stef, Nov 17 '21 at 10:00
I made a big edit to your question to make it more clear. If you don't like the way I phrased your problem or if I misunderstood it, feel free to rollback the edit. — Stef, Nov 17 '21 at 10:34

Stef · Accepted Answer · 2021-11-17T14:12:47.177

A naive simulated annealing algorithm

Initialise N random candidate solutions by choosing random instructions from the list;
Loop:
For each solution in the pool, generate a few new candidates by randomly modifying a few instructions;
Eliminate candidates which do not satisfy the constraints;
Crop down the pool to N, randomly, using the objective function as weights so that good solutions are more likely to survive;
After a large number of iterations, halt and return the candidate with highest objective.

Note that your problem is a multi-objective problem. The algorithm above assume a single objective. There are many different ways to turn a multi-objective problem into a more-or-less-similar single-objective problem, and the choice of how to do that will result in different solutions.

For simplicity, I wrote a single-objective function as a weighted sum of the 5 objectives: the objective is now to maximize 10 * ((c+d)*2+e) - a - b - (e-c) - (e-d).

Another simple possibility would have been to turn some of the objectives into constraints, for instance:

objective minimize c - e into constraint e - c < 100;
objective minimize c - e into constraint e < 2 * c;
objective minimize a into constraint a < 2 * x.

You can try those changes by modifying coefficients params['objective'] and function satisfies_constraints in the code below.

Python code

from more_itertools import random_product
import random
from itertools import chain

raw_data = '''{b += 360} OR {b += 160; c += 160} OR {b += 160; d += 160} OR {b += 160; e += 160}
{a += 360} OR {a += 160; c += 160} OR {a += 160; d += 160} OR {a += 160; e += 160}
{c += 1697; e += 1697} OR {c += 1697; e += 1019; d += 678} OR {c += 1019; e += 1697; d += 678}
{b += 360} OR {b += 160; c += 160} OR {b += 160; d += 160} OR {b += 160; e += 160}
{a += 360} OR {a += 160; c += 160} OR {a += 160; d += 160} OR {a += 160; e += 160}
{a += 1149; d += 939} OR {a += 1149; d += 939; e += 678} OR {a += 939; d += 678; e += 1149}'''

# input: string "{a += 1149; d += 939}"
# output: list [1149, 0, 0, 939, 0]
def parse_instructionset(s):
    instructions_list = [instruction.split('+=') for instruction in s.strip()[1:-1].split(';')]
    instructions_dict = { k.strip(): int(v) for k,v in instructions_list }
    return [instructions_dict.get(k, 0) for k in 'abcde']

# output: list of lists of lists
# representing lines of disjonctions of instruction sets
def parse_data(raw_data):
    rows = [line.split('OR') for line in raw_data.split('\n')]
    return [[parse_instructionset(s) for s in row] for row in rows]

# for r in parse_data(raw_data):
#     print(r)
# [[0, 360, 0, 0, 0], [0, 160, 160, 0, 0], [0, 160, 0, 160, 0], [0, 160, 0, 0, 160]]
# [[360, 0, 0, 0, 0], [160, 0, 160, 0, 0], [160, 0, 0, 160, 0], [160, 0, 0, 0, 160]]
# [[0, 0, 1697, 0, 1697], [0, 0, 1697, 678, 1019], [0, 0, 1019, 678, 1697]]
# [[0, 360, 0, 0, 0], [0, 160, 160, 0, 0], [0, 160, 0, 160, 0], [0, 160, 0, 0, 160]]
# [[360, 0, 0, 0, 0], [160, 0, 160, 0, 0], [160, 0, 0, 160, 0], [160, 0, 0, 0, 160]]
# [[1149, 0, 0, 939, 0], [1149, 0, 0, 939, 678], [939, 0, 0, 678, 1149]]

# used a weighted sum to turn the multiobjective into one objective
params = {
    'objective': [-1, -1, 20+1, 20+1, 10-2], # 10 * ((c+d)*2+e) - a - b - (e - c) - (e - d)}
    'x': 1200, # lower bound for 'a'
    'y': 170, # lower bound for 'b'
    'poolsize': 50, # number of candidate solutions to keep at each iteration
    'nbupgrades': 5, # number of new solutions to generate from each candidate
    'distance': 2, # number of instruction sets to randomly modify to get a new solution
    'nbiter': 100 # number of iterations
}

# sum increments to get a,b,c,d,e from the chosen instruction sets
def get_abcde(solution):
    return [sum(increment[k] for increment in solution) for k in range(5)]

# return boolean to check that candidate is valid
def satisfies_constraints(abcde, x=params['x'], y=params['y']):
    a,b,c,d,e = abcde
    return a >= x and b >= y and e > c and e > d

# compute value of objective function for candidate
def get_objective(abcde, objective_coeffs=params['objective']):
    return sum(c*v for c,v in zip(objective_coeffs, abcde))

# populate pool with <pool_size> random candidates
def initialise_pool(data, pool_size=params['poolsize']):
    solutions = [random_product(*data) for _ in range(pool_size)]
    abcdes = [get_abcde(sol) for sol in solutions]
    return [(get_objective(abcde), abcde, sol) for abcde,sol in zip(abcdes, solutions)]

# build pool of new candidates from current pool of candidates
def upgrade_pool(pool, data, nb_upgrades=params['nbupgrades'], distance=params['distance']):
    # copy current candidates
    new_pool = list(pool)
    # add new candidates
    for _,abcde,solution in pool:
        for _ in range(nb_upgrades):
            for row_index in [random.randrange(len(data)) for _ in range(distance)]:
                new_instruction = random.choice(data[row_index])
                new_abcde = [[abcde[k] + new_instruction[k] - solution[row_index][k]] for k in range(5)]
                new_solution = list(chain(solution[:row_index], [new_instruction], solution[row_index+1:]))
            abcde = get_abcde(new_solution)
            if satisfies_constraints(abcde):
                new_pool.append((get_objective(abcde), abcde, new_solution))
    # crop down to <pool_size>
    new_pool = crop(new_pool, len(pool))
    return new_pool

# remove excess candidates
# candidates to keep are chosen randomly
# using value of objective as weight
# randomness is very important here, DO NOT simply keep the n candidates with highest objective
def crop(pool, n):
    return random.choices(pool, weights=[obj for obj,_,_ in pool], k=n)

def main_loop(data, nb_iter=params['nbiter'], pool=None):
    if not pool:
        pool = initialise_pool(data)
    for _ in range(nb_iter):
        pool = upgrade_pool(pool, data)
    return pool

if __name__ == '__main__':
    data = parse_data(raw_data)
    pool = main_loop(data)
    pool.sort(key=lambda triplet:triplet[0], reverse=True)

    print('Best 2 and worst 2:')
    for objective, abcde, _ in pool[:2] + pool[-2:]:
        print(objective, abcde)
    print()
    print('Best:')
    obj, abcde, sol = pool[0]
    print('objective={}'.format(obj))
    print('(c+d)*2+e=', (abcde[2]+abcde[3])*2+abcde[4])
    print('a,b,c,d,e={}'.format(abcde))
    print('increments=[')
    for increment in sol:
        print('  ', increment, ',')
    print(']')

Output

objective=93318
(c+d)*2+e= 9554
a,b,c,d,e=[1259, 320, 2017, 1676, 2168]
increments=[
   [0, 160, 0, 160, 0] ,
   [160, 0, 0, 160, 0] ,
   [0, 0, 1697, 678, 1019] ,
   [0, 160, 160, 0, 0] ,
   [160, 0, 160, 0, 0] ,
   [939, 0, 0, 678, 1149] ,
]

Sneftel · Answer 2 · 2021-11-17T11:38:20.350

What you have there is an example of the "multi-dimensional multiple choice knapsack problem"*. (Technically what you've described is also "multi-objective", but I suspect that you don't actually want a multi-objective answer, which would be in the form of a Pareto front rather than a single solution; you just haven't decided how to combine your objectives together yet.) This problem is, of course, NP-hard, and it would likely be impractical to use a pseudo-polynomial approach like dynamic programming, given the size and dimensionality of your input values.

So you'll have to make do with an approximate algorithm. A randomized approach like simulated annealing would probably work pretty well, though tabu search might be more effective for certain inputs.

*Technically it's not quite a KP because two of the constraints involve multiple variables, but that won't make a significant difference in what approaches are available to you.)

I can't even phrase the question, I need 3 closely equal number from a huge set of numbers

2 Answers2

A naive simulated annealing algorithm

Python code

Output