-3

I want to approximately solve the knapsack problem for big data sets using Python.

Right now, I am using this implementation, which works well for small examples like:

import knapsack
weight = np.random.randint(10, size = 10)
value = np.random.randint(10, size = 10)
capacity = 5
knapsack.knapsack(weight, value).solve(capacity)

but when we scale it up to:

import knapsack
weight = np.random.randint(10, size = 1000)
value = np.random.randint(10, size = 1000)
capacity = 500
knapsack.knapsack(weight, value).solve(capacity)

the program just gets stuck and gives an error. I was wondering if there is some implementation of the knapsack problem where we can state something like compute for 10 seconds and return me the best solution found so far, is this possible?

HolyMonk
  • 432
  • 6
  • 17
  • 1
    To my knowledge, there are no proper solvers for combinatorial optimisation in python. Also, complex heuristics are usually employed to approximate a solution - I have looked at the source code of that package and it appears to be 6 lines of code that does anything - most likely they're trying to brute-force an NP-Hard problem... that will choke fast. – roganjosh Jan 31 '18 at 23:02
  • Formulate it as mixed-integer program. This will be hard to beat (especially for good approximations), even when using open-source solvers (and there are things like maxTime and mipGap supported). If you expect answers with benchmarks, add some *real* example (which you are trying to solve and struggle with) to your question (as your examples are solvable in polynomial time)! – sascha Jan 31 '18 at 23:17
  • Thanks for the responses, don't really get why I get downvoted.. Posing it as mixed integer and using those solution methods is a good idea thank you! – HolyMonk Jan 31 '18 at 23:28
  • Of course I can solve this problem by just looking at it, but it is just an example.. – HolyMonk Jan 31 '18 at 23:28
  • 1
    The point is that you should show us realistic data because that affects what's an appropriate solution. – Stefan Pochmann Jan 31 '18 at 23:29
  • 1
    It's one of the worst examples possible (which let's NP-hardness vanish)! Add a good one! Additionally you are forcing us to read out the exact definition of the problem in your external library used as you did not explain it. 0-1 knapsack vs. bounded vs. unbounded vs .... – sascha Jan 31 '18 at 23:29
  • @StefanPochmann : I adapted the example, but I am already quite pleased with the solution proposed by sascha. – HolyMonk Jan 31 '18 at 23:36

1 Answers1

4

Here a small prototype 0-1-integer programming approach for the 0-1 knapsack!

This code:

  • is not doing everything perfectly!
    • e.g. constraints vs. bounds (latter more efficient; but too lazy to check cylp again for that; problems in the past)
  • not much support for windows!
    • windows users: go for pulp which brings the same solver (imho the best free open-source MIP-solver); although modelling looks quite different there!
  • no tuning!
    • observe: CoinOR's Cgl, which is used in the solver Cbc, supports extra knapsack-cuts!
    • as the logs show: example is too simple to effect in their usage!
  • bounded / unbounded knapsack-versions are easily handled by just modifying the bounds

The example here just solves one problem as defined by OP using a PRNG-seed of 1, where it takes 0.02 seconds, but that's not a scientific test! NP-hard problems are all about easy vs. hard instances (huge variance!) and because of that, data to check against is important! One can observe, that there is no real integrality-gap for this example.

Code

import numpy as np
import scipy.sparse as sp
from cylp.cy import CyClpSimplex
np.random.seed(1)

""" INSTANCE """
weight = np.random.randint(10, size = 1000)
value = np.random.randint(10, size = 1000)
capacity = 500

""" SOLVE """
n = weight.shape[0]
model = CyClpSimplex()
x = model.addVariable('x', n, isInt=True)
model.objective = -value
model += sp.eye(n) * x >= np.zeros(n)  # could be improved
model += sp.eye(n) * x <= np.ones(n)   # """
model += np.matrix(weight) * x <= capacity  # cylp somewhat outdated in terms of np-usage!
cbcModel = model.getCbcModel()  # Clp -> Cbc model / LP -> MIP
cbcModel.logLevel = True
status = cbcModel.solve()
x_sol = np.array(cbcModel.primalVariableSolution['x'].round()).astype(int)  # assumes there is one

print(x_sol)
print(x_sol.dot(weight))
print(x_sol.dot(value))

Output

Welcome to the CBC MILP Solver 
Version: 2.9.9 
Build Date: Jan 15 2018 

command line - ICbcModel -solve -quit (default strategy 1)
Continuous objective value is -1965.33 - 0.00 seconds
Cgl0004I processed model has 1 rows, 542 columns (542 integer (366 of which binary)) and 542 elements
Cutoff increment increased from 1e-05 to 0.9999
Cbc0038I Initial state - 1 integers unsatisfied sum - 0.333333
Cbc0038I Pass   1: suminf.    0.25000 (1) obj. -1965 iterations 1
Cbc0038I Solution found of -1965
Cbc0038I Branch and bound needed to clear up 1 general integers
Cbc0038I Full problem 1 rows 542 columns, reduced to 1 rows 128 columns
Cbc0038I Cleaned solution of -1965
Cbc0038I Before mini branch and bound, 540 integers at bound fixed and 0 continuous
Cbc0038I Mini branch and bound did not improve solution (0.02 seconds)
Cbc0038I After 0.02 seconds - Feasibility pump exiting with objective of -1965 - took 0.01 seconds
Cbc0012I Integer solution of -1965 found by feasibility pump after 0 iterations and 0 nodes (0.02 seconds)
Cbc0038I Full problem 1 rows 542 columns, reduced to 1 rows 2 columns
Cbc0001I Search completed - best objective -1965, took 0 iterations and 0 nodes (0.02 seconds)
Cbc0035I Maximum depth 0, 362 variables fixed on reduced cost
Cuts at root node changed objective from -1965.33 to -1965.33
Probing was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Gomory was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Knapsack was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
Clique was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
MixedIntegerRounding2 was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
FlowCover was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)
TwoMirCuts was tried 0 times and created 0 cuts of which 0 were active after adding rounds of cuts (0.000 seconds)

Result - Optimal solution found

Objective value:                -1965.00000000
Enumerated nodes:               0
Total iterations:               0
Time (CPU seconds):             0.02
Time (Wallclock seconds):       0.02

Total time (CPU seconds):       0.02   (Wallclock seconds):       0.02

[0 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 1 1 0 0 1 0 0
 0 1 0 0 0 0 0 1 0 1 1 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 1 1 0
 0 0 0 1 0 0 0 0 1 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 1 1
 1 0 0 1 1 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 1 1 1 0 0 1 0 0 0 0
 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 1 1 0 0 0 1 1 1 1
 0 1 0 0 0 0 0 0 0 1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 1 0
 1 1 0 0 1 0 1 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0
 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 1 1 1 0 1 0 1 1 0
 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 1 1 1 0 1 0 1 0 1 1 0 0 1 0 1 0 0
 0 0 0 0 1 1 0 0 0 0 0 1 1 0 1 0 1 1 0 0 0 0 0 1 0 0 0 1 1 0 1 0 0 0 0 1 1
 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1
 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 1 1 0 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0
 0 0 0 1 0 0 1 0 0 1 0 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 1
 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 0 0
 1 1 1 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 0 0 0
 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 1 1 1 1 0 0 0 0 0 0 1 0 1 1 0 0 1 0 0 0 1 1
 1 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 1 1 0 0 0 1 1 0 0 1 0 0
 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 1 1
 0 0 1 1 0 0 0 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 1 0 0 1 1 0 0 0 0 1 1
 0 0 0 1 1 0 1 1 0 0 1 0 0 0 0 1 0 0 1 0 0 1 1 1 0 0 0 1 1 1 0 0 1 0 0 0 1
 1 1 1 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 0 0 0
 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 1 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 0 0
 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 1 0 1 1 0 1 1 0 1 1 0 1 0 0 0 0 1 0 0 0 1 1
 0 1 0 0 0 0 0 1 0 1 1 0 1 0 1 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0
 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1 0 1 0 1 1 0 1 0 0 0 1 0 0 1 0 1 1 0
 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 1 0 0
 0 0 1 1 1 0 0 1 0 0 1 1 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 1 1
 0]
500
1965
sascha
  • 32,238
  • 6
  • 68
  • 110
  • Is 1965 the value sum you found? That's also what I get with naive greedy, in 0.0012 seconds. – Stefan Pochmann Feb 01 '18 at 00:02
  • Uh... ok... sorry I don't know numpy/scipy/cylp enough to understand whether that's what that value means. – Stefan Pochmann Feb 01 '18 at 00:11
  • You are right about the value. But the consequences of such observations are hard to evaluate for NP-hard problems (when is my method good, when is yours good; there surely is some merit for greedy in some scenarios). I warned about that. One obvious thing here: my code proves 1965 is the optimal solution as the method is sound and complete. Yours is not and in terms of approximation, yours is probably limited to a factor 2 (factor depends on the exact greedy-approach and there are very good greedy-like approximations much better; 2 is the naive one). – sascha Feb 01 '18 at 00:13
  • Did I claim optimality or anything like that? I just wanted some "verification" whether my result might be correct, and mentioned how I got it. Was working on an optimal one but now I'm not in the mood anymore, thanks a lot. Btw I don't think it's a huge coincidence that my greedy one did find the optimal value, with data generated like this I think it has a good chance to. – Stefan Pochmann Feb 01 '18 at 00:35
  • Nobody will stop you. For NP-hard things i typically favour using tools optimized for decades before approaching current scientific ideas (as the former is often fast to implement and often very good). With such a classic problem i would never go blind and ignore the above by doing something from scratch (standing on the shoulder of giants), except i want to experiment. But to be honest: doing that with SO questions is often unmotivating as data-statistics are often incompletely specified. [His generators](http://www.diku.dk/~pisinger/codes.html) might be more interesting (years of research). – sascha Feb 01 '18 at 00:43
  • Well you somewhat stopped me. Not very motivating to be called dumb for trying to verify something. I did finish my optimal solver anyway, or rather fix its bug, and it takes about 0.0065 seconds on repl.it to find the answer 1965. Not computing how to achieve that value, though. Not clear whether the OP wants more. Not going blind and from scratch, either, I did use numpy and the obvious dp algorithm that's also the first described on the wikipedia article. – Stefan Pochmann Feb 01 '18 at 01:04