9

Let us say that a multiset M dominates another multiset N if each element in N occurs at least as many times in M.

Given a target multiset M and an integer k>0, I'd like to find a list, L, of size-k multisets whose sum dominates M. I'd like this list to be of small cost, where my cost function is of the form:

cost = c*m + n

where c is a constant, m is the number of multisets in L, and n is the number of distinct multisets in L.

How can I do this? An efficient algorithm to find the optimal solution would be ideal.

The problem comes from trying to fulfill a customer's order for printing pages with a specialized block-printer that prints k pages at a time. Setting up the block-printer to print a particular template of k pages is costly, but once a template is initialized, printing with it is cheap. The target multiset M represents the customer's order, and the n distinct multisets of the list L represent n distinct k-page templates.

In my particular application, M typically has >30 elements whose multiplicities are in the range [10^4, 10^6]. The value of k is 15, and c is approximately 10^-5.

dshin
  • 2,354
  • 19
  • 29
  • What have you tried? SO isn't a code-writing service. Which parts of the algorithm confuse you or are you having trouble with? – dwanderson Apr 06 '16 at 19:57
  • 1
    My guess is that finding the optimal solution is NP-complete, but greedy can find a solution fairly easily. – btilly Apr 06 '16 at 20:04
  • @dwanderson One approach is to fix n and then formulate the optimization problem as an integer quadratic program, which can be solved with a scientific library. Then iterate over candidate values of n. I'm hoping for something nicer. I've tried a greedy-ish algorithm which works decently but is not optimal. – dshin Apr 06 '16 at 20:11
  • 2
    Interesting problem! Is the quadratic program actually solvable to optimality? Would it make sense to consider a variant where each type of page can belong to exactly one template? – David Eisenstat Apr 07 '16 at 01:00
  • @j_random_hacker I was actually introduced to this problem by my friend - he told me that his company's current solution is to have a human try combinations by hand for hours until he finds one that seems ok. If you are serious about being paid, I can get you guys in touch. I'd imagine he'd want to see how the algorithm compares to the human solution (and to a simple greedy solution) on real life examples to put a dollar amount on the algorithm's value. – dshin Apr 07 '16 at 01:42
  • (Fixed prev comment) This is similar to the NP-hard Cutting Stock problem. I've come up with a deterministic algorithm that will take just O(k|M|^2) total time (with |M| being the number of elements in M, not their sum) to find |M| solutions: one for each possible number i of distinct templates. The solution for i distinct templates exactly minimises the *maximum* copy count of any template under the constraint that it is possible to order templates, and pages, so that each page spans a contiguous block of templates. Taking the best of these |M| solutions should give a good quality answer. – j_random_hacker Apr 07 '16 at 02:24
  • @DavidEisenstat I don't think integer quadratic programs are optimally solvable in polynomial time in general, but that often doesn't stop commercial scientific solvers from doing a good job. Adding a restriction of not sharing pages between templates hurts your ability to minimize cost. – dshin Apr 07 '16 at 02:33
  • 1
    @dshin Did you try to compare combinations computed by humans with combinations computed by a greedy algorithm? – piotrekg2 Apr 07 '16 at 11:56
  • @dshin I know about the magic of commercial solvers, but I'd be a little surprised if you had a formulation that they would work well on. – David Eisenstat Apr 07 '16 at 12:11
  • @piotrekg2 My friend has so far only given me one problem instance, without a corresponding human solution. I am pushing him to make more available. – dshin Apr 07 '16 at 13:24
  • @DavidEisenstat btw, I just deleted an earlier comment saying I could express the optimization problem with linear constraints, but I realized I was wrong. The only formulation I know of currently has quadratic constraints. – dshin Apr 07 '16 at 14:28
  • This seems like a really interesting question! It does seem like it might be NP-hard to find the exact optimum, but I would be very interested in hearing some sample problem instances to see what I can get. – arghbleargh Jun 15 '16 at 05:12
  • Since multiplicities are so high why not just make sets of `k` identical elements? Let's say M has `p` distinct elements, you'll have at first `n = p` (without counting the reminder multiplicities). Since `k` is 15 I guess all combined remaining items will be `r < 15*p` which if you combine them at random will result in `< p` k-sets. So at the end you'll have `n < 2p`. So `m = |M| / 15` if `p` is [10, 10^2] then `|M|` is in [10^5, 10^8] and `m` is in [10^4, 10^7] so `c*m` would be in [10^-1, 10^2] which is fairly comparable to `p` so roughly `cost < 3p` – Carlo Moretti Mar 15 '22 at 10:00
  • @CarloMoretti Consider the case when M consists of a*k distinct elements, each with identical multiplicity b. Clearly the optimal cost here is a(cb+1). Your solution, if I understand it correctly, does no better than a(cb+k). This performs nearly k times worse than optimal if k>>cb. – dshin Mar 15 '22 at 13:26
  • Well given that `k` is `15` and your example is worst case for my approach it can be bearable IMO. At least it gives you a fixed cap, with the "one of a kind" approach that would be optimal in your example, if distinct elements aren't multiples of `k` or multiplicities vary a lot (they have a 10^2 range right?) you're back having to find a way of grouping them. You can always take some initial time calculating the number of distinct elements and their multiplicity, then based on that remove as many as you can with you "horizontal" approach and clean up the rest with "vertical" approach. – Carlo Moretti Mar 15 '22 at 14:03

0 Answers0