6

I have a problem which is similar to the Knapsack problem, more specifically the multidimensional variation.

I have a bunch of objects which all have a cost, a value, and a category. I need to the Knapsack optimisation for value under a maximum cost, but also have a specific number of objects in each category.

I have successfully implemented in C++ the original knapsack algorithm, without paying attention to the categories.

When I tried to add the categories, I figured out that I could simply treat this as a multidimensional knapsack problem, which each categories being a weight of either 0 or 1 in a new dimension.

My main problem is that I do not only have a maximum, ex: 5 objects of type food, but also a minimum, since I need exactly 5 objects of type food.

And I can't figure out how to add a minimum into the algorithm.

Obviously, I can use a general case, where every dimension has a maximum and minimum, and optimise for total, since all my dimensions but one only have a range of 1, so this would end up optimising for value anyway. Furthermore, I can set the minimum for value to zero, to avoid having one dimension without a minimum, and it would still work.

I'm working in C++, but honestly even pseudo-code would be fine, I just need the algorithm.

Obviously I also need it to be fast, if possible as fast as the multidimensional variation.

Here is an example of the test case. As this is mostly an optimization problem, the instance is huge, but it should work on any instance size. The number of possible categories and number of category fields is fixed.

You have a backpack that can hold a maximum of 100 units of weight, and a list of 1000 objects, each object having a value, a weight and a type. You specifically need to bring exactly 10 objects of type food, 15 objects of type clothing and 5 Tools. Every object has a completely arbitrary (but greater than 0) value in dollars, and weight in units. I would need to find the optimal configuration for value respecting the maximum weight and the specific number of each type of items.

The list of objects will always contain at least one valid configuration, which means that it will always have at least enough objects of every type that will end up Under the maximum weight, so I don't have to plan for the "no answer" case. I just have to find the best answer for a (probably) huge number of available items.

Cœur
  • 37,241
  • 25
  • 195
  • 267
Kaito Kid
  • 983
  • 4
  • 15
  • 34
  • Could you please give rough estimate about the size of the input? How many categories, entries per category, etc. – Mann Apr 04 '17 at 12:43
  • My original solution had a bug (the order of an if else should have been reversed), and I've reposted a (more) correct solution. – jwimberley Apr 04 '17 at 14:39
  • @Mann The input could be huge, since this is mostly an optimization problem. On the other hand, there is only one category field, and only three categories in this field. Example: Backpack with weight capacity of 1000, and 100 000 objects that are either food, clothing or Tools, each of them has a weight between 1-15 and a value between 1-25. I need exactly 30 Tools, 30 clothes and 50 food items, and the highest value possible – Kaito Kid Apr 05 '17 at 14:03

2 Answers2

4

Knowing exactly how many items can be selected from each category is a big constraint. Consider the simplest case that there is one category. You much choose exactly N objects to maximize the value sum[v_i x_i] for a cost sum[w_i x_i] < W, where x_i is equal to 0 or 1 (following Wikipedia's notation). The new constraint is that sum[x_i] = N. This constraint can be included in the problem by adding another dimension in the dynamic programming, but explicitly checking whether or not a solution is valid and has exactly the number of required elements.

Vanilla knapsack problem

Here is a brief demonstration: Take as the starting point this solution to the standard 0/1 knapsack problem via memoization:

#include <cstdio>
#include <iostream>
#include <vector>
#include <map>
#include <algorithm>

using uint = unsigned int;

template <typename T>
struct item {
    T value;
    uint weight;
};

template <typename T>
T knapSack(uint W, const std::vector< item<T> >& items) {
    
    std::map< std::pair<uint, uint>, T> cache;
    
    std::function<T(uint, uint)> recursion;
    recursion = [&] (uint n, uint w) {
        if (n == 0)
            return 0;
        auto it = cache.find(std::make_pair(n,w));
        if (it != cache.end())
            return it->second;
        T _v = items[n-1].value;
        uint _w = items[n-1].weight;
        T nextv;
        if (_w <= w)
            nextv = std::max(_v + recursion(n-1,w-_w),recursion(n-1,w));
        else
            nextv = recursion(n-1,w);
        cache.insert(std::make_pair(std::make_pair(n,w),nextv));
        return nextv;   
    };

    return recursion(items.size(),W);
}

My implementation here (with a recursive lambda function) emphasizes readability over optimality. A choice of objects with indices < N and sum of weights < W is either a choice of objects with indices < N-1 and sum of weights < W or the object at N-1 together with objects with indices < N-1 and sum of weights < W - w[N-1].

Knapsack problem with one category with fixed required number of objects

We can go on to add a new restriction that keeps track of the number of elements selected. We'll do this by noting that the new choice of objects at each recursive step has 0 or 1 more elements than the one before it, in the same way that it has the same or greater sum of weights -- i.e., a choice of K objects with indices < N and sum of weights < W is either a choice of K objects with indices < N-1 and sum of weights < W or the object at N-1 together with K-1 objects with indices < N-1 and sum of weights < W - w[N-1]. However, we'll also want to track violations -- for instance, we can't find K objects with indices < N when K>N. In such a case we should report that the maximum possible value is 0, because the choice is impossible, but we should mark this as "invalid" to differentiate this from the the trivial base case of the recursion. Moreover, any solution higher up the chain that attempts to use this as a sub-solution should also be marked as invalid. For this reason, we change the return type from a simple value to a pair of a value and boolean. As part of the base case, we mark all entries with K>N as having a maximum value of 0 but as invalid:

template <typename T>
std::pair<T,bool> knapSackConstrained(uint W, uint K, const std::vector< item<T> >& items) {
    
    std::map< std::tuple<uint, uint, uint>, std::pair<T,bool> > cache;
    
    std::function<std::pair<T, bool>(uint, uint, uint)> recursion;
    recursion = [&] (uint n, uint w, uint k) {
        if (k > n)
            return std::make_pair(0,false);
        if (n == 0 || k == 0)
            return std::make_pair(0,true);
        auto it = cache.find(std::make_tuple(n,w,k));
        if (it != cache.end())
            return it->second;
        T _v = items[n-1].value;
        uint _w = items[n-1].weight;
        T nextv;
        bool nextvalid = true;
        if (_w <= w) {
            auto take = recursion(n-1,w-_w,k-1);
            auto reject = recursion(n-1,w,k);
            if (take.second and reject.second) {
                nextv = std::max(_v + take.first,reject.first);
            } else if (take.second) {
                nextv = _v + take.first;
            } else if (reject.second) {
                nextv = reject.first;
            } else {
                nextv = 0;
                nextvalid = false;
            }   
        } else {
            std::tie(nextv,nextvalid) = recursion(n-1,w,k);
        }
        std::pair<T,bool> p = std::make_pair(nextv,nextvalid);
        cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
        return p;   
    };

    return recursion(items.size(),W,K);
}

Here is a simple main routine running this code and its output:

int main(int argc, char *argv[]) {
    std::vector< item<int> > items = {{60,10},{10,6},{10,6}};
    int  j = 13;
    std::cout << "Unconstrained: " << knapSack(j,items) << std::endl;
    for (uint k = 1; k <= items.size(); ++k) {
        auto p = knapSackConstrained(j,k,items);
        std::cout << "K = " << k << ": " << p.first;
        if (p.second)
            std::cout << std::endl;
        else
            std::cout << ", no valid solution" << std::endl;
    }
    return 0;
}

% OUTPUT %
Unconstrained: 60
K = 1: 60
K = 2: 20
K = 3: 0, no valid solution

Since the sum of the 3 weights is already greater than the threshold, the solution requiring all three is impossible.

Knapsack problem with multiple categories with fixed required number of objects

The above only partly addresses your problem, since you have multiple categories rather than simply one. However, I believe that this can be extended to be multi-dimensional without too much extra work. Indeed, I suspect the following code is the right strategy for the multi-dimensional case, modulo bugs -- it would require some good test cases for verification. The single argument K is replaced with a vector of category numbers, and the item struct is given a category field. The base case must consider each possible K>N case (for each category), and in addition the check that the (i-1)st weight is less than W must be extended to check that there is at least 1 item more required of the (i-1)st category.

#include <cstdio>
#include <iostream>
#include <vector>
#include <map>
#include <algorithm>

using uint = unsigned int;

template <typename T>
struct item {
    T value;
    uint weight;
    uint category;
};

template <typename T>
std::pair<T,bool> knapSack(uint W, const std::vector<uint>& K, const std::vector< item<T> >& items) {

    std::map< std::tuple<uint, uint, std::vector<uint> >, std::pair<T,bool> > cache;
    
    std::function<std::pair<T, bool>(uint, uint, std::vector<uint>)> recursion;
    recursion = [&] (uint n, uint w, std::vector<uint> k) {
        
        auto it = cache.find(std::make_tuple(n,w,k));
        if (it != cache.end())
            return it->second;
        
        std::vector<uint> ccount(K.size(),0);
        for (uint c = 0; c < K.size(); ++c) {
            for (uint i = 0; i < n; ++i) {
                if (items[i].category == c)
                    ++ccount[c];
            }
        }
        for (uint c = 0; c < k.size(); ++c) {
            if (k[c] > ccount[c]) {
                auto p = std::make_pair(0,false);
                cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
                return p;
            }
        }

        uint sumk = 0; for (const auto& _k : k) sumk += _k;
        if (n == 0 || sumk == 0) {
            auto p = std::make_pair(0,true);
            cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
            return p;
        }

        T _v = items[n-1].value;
        uint _w = items[n-1].weight;
        uint _c = items[n-1].category;
        T nextv;
        bool nextvalid = true;
        if (_w <= w and k[_c] > 0) {
            std::vector<uint> subk = k;
            --subk[_c];
            auto take = recursion(n-1,w-_w,subk);
            auto reject = recursion(n-1,w,k);
            if (take.second and reject.second) {
                nextv = std::max(_v + take.first,reject.first);
            } else if (take.second) {
                nextv = _v + take.first;
            } else if (reject.second) {
                nextv = reject.first;
            } else {
                nextv = 0;
                nextvalid = false;
            }   
        } else {
            std::tie(nextv,nextvalid) = recursion(n-1,w,k);
        }
        std::pair<T,bool> p = std::make_pair(nextv,nextvalid);
        cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
        return p;   
    };

    return recursion(items.size(),W,K);
}
 
int main(int argc, char *argv[]) {
    std::vector< item<int> > items = {{60,10,0}, {100,20,1}, {120,30,0}, {140,35,1}, {145,40,0}, {180,45,1}, {160,50,1}, {170,55,0}};
    int  j = 145;
    for (uint k1 = 0; k1 <= items.size(); ++k1) {
        for (uint k2 = 0; k2 <= items.size(); ++k2) {
            auto p = knapSack(j,std::vector<uint>({k1,k2}),items);
            if (p.second)
                std::cout << "K0 = " << k1 << ", K1 = " << k2 << ": " << p.first << std::endl;
        }
    }
    return 0;
}

% OUTPUT (with comments) %

K0 = 0, K1 = 0: 0
K0 = 0, K1 = 1: 180 // e.g. {} from 0, {180} from 1
K0 = 0, K1 = 2: 340 // e.g. {} from 0, {160,180} from 1
K0 = 0, K1 = 3: 480 // e.g. {} from 0, {140,160,180} from 1
K0 = 1, K1 = 0: 170 // e.g. {170} from 0, {} from 1
K0 = 1, K1 = 1: 350 // e.g. {170} from 0, {180} from 1
K0 = 1, K1 = 2: 490 // e.g. {170} from 0, {140, 180} from 1
K0 = 1, K1 = 3: 565 // e.g. {145} from 0, {100, 140, 180} from 1
K0 = 2, K1 = 0: 315 // e.g. {145,170} from 0, {} from 1
K0 = 2, K1 = 1: 495 // e.g. {145,170} from 0, {180} from 1
K0 = 2, K1 = 2: 550 // e.g. {60,170} from 0, {140,180} from 1
K0 = 2, K1 = 3: 600 // e.g. {60,120} from 0, {100,140,180} from 1
K0 = 3, K1 = 0: 435 // e.g. {120,145,170} from 0, {} from 1
K0 = 3, K1 = 1: 535 // e.g. {120,145,170} from 0, {100} from 1
K0 = 3, K1 = 2: 605 // e.g. {60,120,145} from 0, {100,180} from 1
K0 = 4, K1 = 0: 495 // e.g. {60,120,145,170} from 0, {} from 1

For a given set of items with two categories, the output appears to be correct, though my manual check might have failed to spot some problems [an earlier version of this answer did indeed have some bugs]. All the cases not printed are those with no solution.

Returning the set of selected objects

If you want the function to return the set of selected objects, this is no obstacle in principle -- the code just gets messier. The most human-understandable thing to do is simply add an std::set<std::size_t> to the tuple of objects returned by recursion and by knapSack, and stored in the cache, representing the collection of indices of the chosen objects. Each time a new object is added, this set can be augmented. The resulting code involves lots of copying of sets of integers, and is probably far from optimal -- a better solution might involve a static boolean vector whose entries are toggled on and off. However, it works and makes sense, so here it is:

#include <cstdio>
#include <iostream>
#include <vector>
#include <map>
#include <set>
#include <algorithm>

using uint = unsigned int;

template <typename T>
struct item {
    T value;
    uint weight;
    uint category;
};

template <typename T>
std::tuple<T,bool,std::set<size_t> > knapSack(uint W, std::vector<uint> K, const std::vector< item<T> >& items) {

    std::map< std::tuple<uint, uint, std::vector<uint> >, std::tuple<T,bool,std::set<std::size_t> > > cache;
    
    std::function<std::tuple<T,bool,std::set<std::size_t> >(uint, uint, std::vector<uint>&)> recursion;

    recursion = [&] (uint n, uint w, std::vector<uint>& k) {
        
        auto it = cache.find(std::make_tuple(n,w,k));
        if (it != cache.end())
            return it->second;
        
        std::vector<uint> ccount(K.size(),0);
        for (uint i = 0; i < n; ++i) {
            ++ccount[items[i].category];
        }
        
        for (uint c = 0; c < k.size(); ++c) {
            if (k[c] > ccount[c]) {
                auto p = std::make_tuple(0,false,std::set<std::size_t>{});
                cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
                return p;
            }
        }
        uint sumk = 0; for (const auto& _k : k) sumk += _k;
        if (n == 0 || sumk == 0) {
            auto p = std::make_tuple(0,true,std::set<std::size_t>{});
            cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
            return p;
        }

        T _v = items[n-1].value;
        uint _w = items[n-1].weight;
        uint _c = items[n-1].category;
        T nextv;
        bool nextvalid = true;
        std::set<std::size_t> nextset;
        if (_w <= w and k[_c] > 0) {
            --k[_c];
            auto take = recursion(n-1,w-_w,k);
            ++k[_c];
            auto reject = recursion(n-1,w,k);
            T a = _v + std::get<0>(take);
            T b = std::get<0>(reject);
            if (std::get<1>(take) and std::get<1>(reject)) {
                nextv = std::max(a,b);
                if (a > b) {
                    nextset = std::get<2>(take);
                    nextset.insert(n-1);
                } else {
                    nextset = std::get<2>(reject);
                }
            } else if (std::get<1>(take)) {
                nextv = a;
                nextset = std::get<2>(take);
                nextset.insert(n-1);
            } else if (std::get<1>(reject)) {
                nextv = b;
                nextset = std::get<2>(reject);
            } else {
                nextv = 0;
                nextvalid = false;
                nextset = {};
            }   
        } else {
            std::tie(nextv,nextvalid,nextset) = recursion(n-1,w,k);
        }
        auto p = std::make_tuple(nextv,nextvalid,nextset);
        cache.insert(std::make_pair(std::make_tuple(n,w,k),p));
        return p;   
    };

    return recursion(items.size(),W,K);
}
 
int main(int argc, char *argv[]) {
    std::vector< item<int> > items = {{60,10,0}, {100,20,1}, {120,30,0}, {140,35,1}, {145,40,0}, {180,45,1}, {160,50,1}, {170,55,0}};
    int  j = 145;
    for (uint k1 = 0; k1 <= items.size(); ++k1) {
        for (uint k2 = 0; k2 <= items.size(); ++k2) {
            auto p = knapSack(j,std::vector<uint>({k1,k2}),items);
            if (std::get<1>(p)) {
                std::cout << "K0 = " << k1 << ", K1 = " << k2 << ": " << std::get<0>(p);
                std::cout << "; contents are {";
                for (const auto& index : std::get<2>(p))
                    std::cout << index << ", ";
                std::cout << "}" << std::endl;
            }
        }
    }
    return 0;
}

The output of this is

K0 = 0, K1 = 0: 0; contents are {}
K0 = 0, K1 = 1: 180; contents are {5, }
K0 = 0, K1 = 2: 340; contents are {5, 6, }
K0 = 0, K1 = 3: 480; contents are {3, 5, 6, }
K0 = 1, K1 = 0: 170; contents are {7, }
K0 = 1, K1 = 1: 350; contents are {5, 7, }
K0 = 1, K1 = 2: 490; contents are {3, 5, 7, }
K0 = 1, K1 = 3: 565; contents are {1, 3, 4, 5, }
K0 = 2, K1 = 0: 315; contents are {4, 7, }
K0 = 2, K1 = 1: 495; contents are {4, 5, 7, }
K0 = 2, K1 = 2: 550; contents are {0, 3, 5, 7, }
K0 = 2, K1 = 3: 600; contents are {0, 1, 2, 3, 5, }
K0 = 3, K1 = 0: 435; contents are {2, 4, 7, }
K0 = 3, K1 = 1: 535; contents are {1, 2, 4, 7, }
K0 = 3, K1 = 2: 605; contents are {0, 1, 2, 4, 5, }
K0 = 4, K1 = 0: 495; contents are {0, 2, 4, 7, }

Algorithmic complexity

This is not my forte, but I believe the runtime complexity is psuedo-polynomial, since the algorithm is very similar to the standard knapsack algorithm.

jwimberley
  • 1,696
  • 11
  • 24
  • This answer looks very promising and complete. I'll accept it and award the Bounty tomorrow after testing if it works. Thanks a log for your effort. – Kaito Kid Apr 05 '17 at 14:05
  • @KaitoKid You're welcome! Let me know if your tests pick up any problems. – jwimberley Apr 05 '17 at 14:42
  • Out of curiosity, what is the complexity of your code? – Kaito Kid Apr 06 '17 at 16:22
  • Also, I assume that "and" means "&&" and "uint" means "unsigned int" in your code, is that correct? – Kaito Kid Apr 06 '17 at 16:35
  • @KaitoKid Re: your second comment -- Yes to both; `and` is idiomatic C++, identical to `&&` (perhaps less commonly used though I prefer it), and I define `using uint = unsigned int;` in my code. – jwimberley Apr 06 '17 at 16:46
  • I am currently doing it myself, but just to inform you, your code doesn't give us the optimal configuration, just its value. – Kaito Kid Apr 06 '17 at 16:46
  • @KaitoKid Re: your first comment -- Oof, I don't know; I'm a scientist without a formal CS background so evaluating its complexity is not my forte. – jwimberley Apr 06 '17 at 16:47
  • it's fine, I'll calculate it on my end, I just figured that if you happened to already know it would be simpler to ask. – Kaito Kid Apr 06 '17 at 16:48
  • Re: your third comment - That's true. I was originally tracking the configuration by including an `std::vector taken` in the recursion. I took it out to remove visual clutter. If you'd like I could put it back in. – jwimberley Apr 06 '17 at 16:50
  • If you'd just tell me where to add items in it, that would save me some time haha – Kaito Kid Apr 06 '17 at 16:51
  • @KaitoKid I added an edit showing a straight-forward way to return the set of chosen objects. It involves lots of copying and storing `std::set` objects, though, and there's probably a better but similar method. – jwimberley Apr 06 '17 at 17:15
  • 1
    It works great (with a few modifications on my side for specific stuff I needed that was unrelated to the question asked here). Here you go, and thank you very much! – Kaito Kid Apr 06 '17 at 17:30
0

I do not actually have a direct answer to your question either it be Pseudo Code or an actual implementation of an algorithm in a specific language but what I can do here is give you a list of references that I think are related to the topic that may help guide you into developing a working algorithm:

Although many of these may not exactly be the Knapsack algorithmic problem; I think these topics may be related in a way to help with your overall achievement for your algorithm. I think these would be useful due to the fact that first and foremost the Knapsack problem is itself a variant of a Partitioning Algorithm that has many implementations and schemes. Also the use of parallel programming and multi threaded programming may also help on large datasets. I found these handful of books and whitepapers that I think would be of a great read.

Basically you have a Knapsack K that has a volume KV that needs to be subdivided into smaller volumes of {KV1, KV2, ... KVn} that holds different data types that each type has a value, weight and category or classification to it and the item's weight represents the portion of volume that it consumes. You also have the constraints that there is a [min, max] bounds with the limitation that you must have at least one of each category or classification. Then with these parameters as your base scenario you then want to maximize KV to contain as many elements as possible but want to do it as efficiently as possible that takes the least amount of time with hopefully linear to polynomial - time and space complexity avoiding quadratic and/or exponential - time and space complexities.

Looking at other algorithms that are uniquely different such as partitioning algorithms, population densities and growth, image compressions, etc. can give you insight into your specific problem as the overall foundations and caveats of these algorithms are similar in nature.

Francis Cugler
  • 7,788
  • 2
  • 28
  • 59