0

I work in a logistic department for a company, recently we have been trying to narrow down the amount of different packaging options that we use.

I have all the necessary product data like length, width, height, volume and also sales data.

So I was thinking if it is possible to use an algorithm to cluster the different volumes of the products and maybe also take into account which sizes are selling the most, to determine, which box sizes would be ideal. (Taking into account how often a product sells is secondary so that is not absolutely necessary)

What I want is that I can give the Algorithm an amount of how many different boxsizes I want and the algorithm should determine where to put the limits, so that there is a solution for every product that we have. With the goal of the optimization being minimum volume wasted while also not using more than the set amount of different boxes.

Also important to note, the orientation of the products and the amount per box is set, so there is no need to determine how to pack the products and how many go into one box idealy or something like that.

What kind of algorithms could be used for a problem like this and what are my options to program them? I was thinking of using Matlab, but would also be open for other possible options. I want to program it, not simply use an existing program like SPSS.

Thanks in advance and forgive me if my english is not the best, I'm not a native speaker.

Marc Sances
  • 2,402
  • 1
  • 19
  • 34
Rissow
  • 11
  • 2
  • 5
    Hello, there is an important point missing in your descritpion: the optimization goal, given the number of box sizes as constraint, what would be a "good" box set? Minimum wasted volume? Minimum cost? (In this case you also should define the cost of a box per each size? Some other detail, all sizes allowed or there is some quantization? – Rocco May 05 '21 at 20:54
  • Thanks for your reply, the optimization goal is to have minimum wasted volume while also only using a set amount of different box sizes. Alternatively it would also be a possibility to say I want to fit each product into a box and there should be a maximum of x volume wasted. All sizes would be allowed. – Rissow May 06 '21 at 06:29
  • It looks like you're looking for the [Knapsack Problem](https://en.wikipedia.org/wiki/Knapsack_problem). Maybe [this](https://stackoverflow.com/questions/40721107/algorithm-for-filling-bag-maximally-this-is-not-the-knapsack-0-1) can help too. – Cheshire Cat May 06 '21 at 14:08
  • 3
    @Rissow Some other detail please, you say that the product amount per box is defined, this means that each box will hold exactly that amount regardless of the size? So for each product you know the minimum dimensions of the box? As consequence the wasted volume would be exacly (Lbox x Wbox x Hbox) - (Lmin x Wmin x Hmin)? – Rocco May 06 '21 at 16:31
  • @Rocco yes, exactly. The minimum dimensions of the boxes are known, because I know the volume of the products and we only ship the same products together and quantifies are either 1 or 5 per box. So if I have products A, B and C they would always ship separately. – Rissow May 06 '21 at 17:28
  • It seems to me minimum volume wasted and minimum types of boxes are orthogonal constraints; taken to the extreme, either one box per content type, or one box fits all. It may be that you have to have a coefficient saying which one is more important or play around with fixed n-box-types. – Neil May 06 '21 at 21:56
  • Are you familiar with "dimensional weight"; what that means and how an overly large shipping container can increase your shipping cost even though the physical weight is the same as a smaller container. I suggest you do more research. Reducing the shipping container options has financial ramifications that are not mentioned in your question. – JohnH May 06 '21 at 23:48
  • Please also mention in the question the number of input box sizes you expect to provide, and the typical maximum number of box sizes you want to get. – j_random_hacker May 09 '21 at 02:00
  • @JohnH thanks for the comment, though I am aware of that, my company has the problem of having a lot of different product sizes, while also not having the room or the sales volume to make it worthwhile to have all those different box sizes at all times. Since a lot of those would only be used once in a blue moon it's for the best that we try to minimize the amount of different box sizes. – Rissow May 11 '21 at 06:56

1 Answers1

0

The following C++ program will find optimal solutions for small instances. For 10 input box sizes, each having dimensions randomly chosen in the range 1..100, and for any number 1..10 of box sizes to choose, it computes the answer in a couple of seconds on my computer. For 15 input box sizes, it takes around 10s. For 20 input box sizes, I could compute up to 4 chosen box sizes in about 3 minutes, with memory becoming an issue (it used around 3GB). I had to increase the linker's default stack size to avoid stack overflows.

#include <iostream>
#include <algorithm>
#include <vector>
#include <array>
#include <map>
#include <set>
#include <functional>
#include <climits>

using namespace std;

ostream& operator<<(ostream& os, array<int, 3> a) {
    return os << '(' << a[0] << ", " << a[1] << ", " << a[2] << ')';
}

template <int N>
long long vol(array<int, N> b) {
    return static_cast<long long>(b[0]) * b[1] * b[2];
}

template <int N, int M>
bool fits(array<int, N> a, array<int, M> b) {
    return a[0] <= b[0] && a[1] <= b[1] && a[2] <= b[2];
}

// Compares first by volume, then lexicographically.
struct CompareByVolumeDesc {
    bool operator()(array<int, 3> a, array<int, 3> b) const {
        return vol(a) > vol(b) || vol(a) == vol(b) && a < b;
    }
};

vector<array<int, 3>> candSizes;

struct State {
    vector<array<int, 4>> req;
    int n;
    int k;

    // Needed for map<>
    bool operator<(State const& other) const {
        return make_tuple(n, k, req) < make_tuple(other.n, other.k, other.req);
    }
} dummy = { {}, -1, -1 };

map<State, pair<int, State>> memo;

// Compute the minimum volume required for the given list of box sizes if we use exactly k of the first n candidate box sizes.
pair<long long, State> solve(State const& s) {
    if (empty(s.req)) return { 0, dummy };
    if (s.k == 0 || s.k > s.n) return { LLONG_MAX / 4, dummy };
    auto previousAnswer = memo.find(s);
    if (previousAnswer != end(memo)) return (*previousAnswer).second;

    // Try using the nth candidate box size.
    int nFitting = 0;
    vector<array<int, 4>> notFitting;
    for (auto r : s.req) {
        if (fits(r, candSizes[s.n - 1])) {
            nFitting += r[3];
        } else {
            notFitting.push_back(r);
        }
    }

    pair<long long, State> solution;
    solution.second = { s.req, s.n - 1, s.k };
    solution.first = solve(solution.second).first;
    if (nFitting > 0) {
        State useNth = { notFitting, s.n - 1, s.k - 1 };
        long long useNthVol = nFitting * vol(candSizes[s.n - 1]) + solve(useNth).first;
        if (useNthVol < solution.first) solution = { useNthVol, useNth };
    }
    memo[s] = solution;
    return solution;
}

void printOptimalSolution(State s) {
    while (!empty(s.req)) {
        State next = solve(s).second;
        if (next.k < s.k) cout << candSizes[s.n - 1] << endl;
        s = next;
    }
}

int main(int argc, char** argv) {
    int n, k;
    cin >> n >> k;
    vector<array<int, 4>> requestedBoxSizes;
    set<int> lengths, widths, heights;
    for (int i = 0; i < n; ++i) {
        array<int, 4> d;        // d[3] is actually the number of requests for this box size
        cin >> d[0] >> d[1] >> d[2] >> d[3];
        sort(begin(d), begin(d) + 3, std::greater<int>());
        requestedBoxSizes.push_back(d);

        lengths.insert(d[0]);
        widths.insert(d[1]);
        heights.insert(d[2]);
    }

    // Generate all candidate box sizes
    for (int l : lengths) {
        for (int w : widths) {
            for (int h : heights) {
                array<int, 3> cand = { l, w, h };
                sort(begin(cand), end(cand), std::greater<int>());
                candSizes.push_back(cand);
            }
        }
    }

    sort(begin(candSizes), end(candSizes), CompareByVolumeDesc());
    candSizes.erase(unique(begin(candSizes), end(candSizes)), end(candSizes));
    cout << "Number of candidate box sizes: " << size(candSizes) << endl;
    State startState = { requestedBoxSizes, static_cast<int>(size(candSizes)), k };
    long long minVolume = solve(startState).first;
    cout << "Minimum achievable volume using " << k << " box sizes: " << minVolume << endl;
    cout << "Optimal set of " << k << " box sizes:" << endl;
    printOptimalSolution(startState);
    return 0;
}

Example input:

15 5
100 61 35 27
17 89 96 47
31 69 30 55
37 23 39 9
94 11 48 19
38 17 29 36
63 79 80 36
59 52 37 51
86 63 54 7
32 30 11 26
50 88 51 5
74 70 33 14
67 46 4 79
83 94 89 58
65 42 37 69

Example output:

Number of candidate box sizes: 2310
Minimum achievable volume using 5 box sizes: 124069460
Optimal set of 5 box sizes:
(94, 48, 11)
(69, 52, 37)
(100, 89, 35)
(88, 79, 63)
(94, 89, 83)

I'll explain the algorithm behind this if there's interest. It's better than considering all possible combinations of k candidate box sizes, but not terribly efficient.

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
  • Thank you, that looks interesting. Could you explain further? The problem might be that I have a lot more sizes that I'd put in 1k+. – Rissow May 08 '21 at 05:48
  • I'll try to add an explanation when I get time, but for now: The algorithm is sensitive to the number of candidate box sizes, which for n input box sizes is O(n^3) in the worst case (when they all have distinct dimensions), and to the number k of box sizes to choose. (The sizes of each dimension are unimportant.) If you can limit the number of candidate box sizes to around 1000, and k to 10 or so, you should be good. Possibilities include changing the code to consider as candidates (1) just the input box sizes, or (2) all box sizes in which each dimension comes from a set of <= 10 sizes. – j_random_hacker May 09 '21 at 01:56
  • 1
    To be clear: If you restrict the set of candidate box sizes to consider, the answer won't necessarily be optimal. But more candidate sizes seem to yield diminishing returns, and you should be able to get a good answer with just a thousand or so. – j_random_hacker May 09 '21 at 01:58
  • Thank you @j_random_hacker, I'll try it out. – Rissow May 11 '21 at 06:57