3

This is my first post in stackoverflow.

I need to advise an algorithm for a financial application. Assume we have 2 lists of figures like this (yes, they are bank transactions) :

List 1          | List 2
-------------------------------------
1000            | 200
7000            | 300
3000            | 10000
16000           | 500
                | 16000
                | 4100

I need to match figures with each other considering some conditions:

  1. Matches can be one-to-one, one-to-many, or even many-to-many. So, here the two 16000 match (one-to-one), 1000 from list 1 matches 200+300+500 from list 2 (one-to-three), 10000 from list 2 matches 7000+3000 from list 1 (one-to-two), and so on.

  2. A figure can be used in more than one match.

  3. Number of figures in the two lists may or may not be equal.

  4. Maximum number of figures in a one-to-many match should be settable.

  5. Many-to-many matches are not a must. But it would be nice if we have them too!

  6. Some figures might be left unmatched. It is OK.

What I'm doing to achieve this, is using two complicated nested loops. It works, but as the number of figures, or the maximum number of allowed figures in each match increase, it takes ages to complete!

Are there any better algorithm to do this?

m.zein
  • 33
  • 3
  • You could spare some computations if you first sort the two lists. Then, I guess that a figure can only be used once within a match ? (for example L1 : 1000 = 500 + 500 : L2 is not a valid match) – Rerito Feb 14 '13 at 13:30
  • What do you do if there is more than 1 way to make a number? For example, if you had in List 2 `300, 350, 450, 500` and needed to match `800` in List 1. There would be 2 ways of doing it - how would you determine which was correct? – RB. Feb 14 '13 at 13:31
  • What's your optimization criteria? Clearly, I can make 16000 out of 10000+4100+500+500+300+300+300. Do you want to minimize the number of total matches? Maximize the number of numbers used? – nneonneo Feb 14 '13 at 13:41
  • @RB - Actually, my program will suggest all possible matches for a selected figure on screen. – m.zein Feb 14 '13 at 14:37
  • @nneonneo - We have only one 300 and one 500 in the lists. So 16000 could not be 10000+4100+500+500+300+300+300. I need to optimize the time, while keeping features. – m.zein Feb 15 '13 at 06:11
  • "A figure can be used in more than one match."? – nneonneo Feb 15 '13 at 20:11

2 Answers2

3

I think I'm right to assert, and SO will give me a kicking if I am wrong, that the kernel of your computation is NP-hard, which means that you are (very) unlikely to find a polynomial-time solution to it. Your kernel is, given a single number (such as 10000) and a list of other numbers, find all the subsets of the list which sum to the single number.

This kernel is a variation of the subset sum problem.

Given that, there are limitations on how much better an algorithm you can find, and your expectations of finding a 'fast' algorithm are likely to be disappointed.

To make your algorithm faster, I'd suggest you start by sorting both lists. Take the first number from list 1, from list 2 take all the numbers less than or equal to the number from list 1, figure out the matches, repeat ... Then work down list 2 number by number ...

High Performance Mark
  • 77,191
  • 7
  • 105
  • 161
  • Well, he *does* allow for repeats on either side. So, it is *always* possible to make a single many-to-many match that consumes every value from both sides using them LCM of their sum. The problem itself might not be NP-hard, but optimizing it according to some criteria may well be. – nneonneo Feb 14 '13 at 13:44
  • This problem is slightly different from the "subset sum problem" in that it only deals with positive numbers so the sum is will always increase. I'm still willing to bet there is no practial solution to this problem given a few hundred numbers to work with. It might not be NP-hard - but it is still very "hard". – NealB Feb 14 '13 at 17:18
  • Thanks for your comments. I grew interested in "subset sum problem" and try to understand it better. – m.zein Feb 15 '13 at 06:05
  • @NealB: The subset sum problem has two equivalent forms, one for summing up to 0 and a more general one for summing up to any integer target. They are essentially the same problem, and are both NP-hard. – nneonneo Feb 15 '13 at 20:10
  • @nneonneo Whether the sum is zero or non-zero is not the point I was trying to make. The general subset sum problem allows both positive and negative numbers, while this problem only seems to deal with positive numbers. Once a sum exceeds the target value you can stop whereas in the general problem you cannot stop because there may be some negative number that could be used so the search path is extended. – NealB Feb 16 '13 at 19:12
  • @NealB: Doesn't matter! The problems are equivalently difficult. At best, you will save a constant factor of work. – nneonneo Feb 16 '13 at 19:34
0

To do this you first generate the combinations of each list. For example, for list 1 the combinations are:

1000
3000
7000
16000
1000 3000
1000 7000
1000 16000
3000 7000
3000 16000
7000 16000
1000 3000 7000
1000 3000 16000
1000 7000 16000
3000 7000 16000
1000 3000 7000 16000

For each combination you generate the sum of the items in the combination. Now you have two lists of sums. To solve the problem you intersect the two lists. There are various algorithms to perform an intersection. One simple approach is to make the smaller of the two lists into a binary tree. Then, for each item in the larger list, you find it in the binary tree. This algorithm has n*log(n) time complexity.

Tyler Durden
  • 11,156
  • 9
  • 64
  • 126
  • Hmmmmmm. Interesting! Does this approach cover mentioned requirements? – m.zein Feb 15 '13 at 06:08
  • I see no element of the "requirements" which would not be in the scope of the approach I describe. For example, if you want to restrict the number of figures in a combination, that is easy because when you generate combinations as described above, normally it is done by number of "slots", ie, first you do all the combinations in 1 slot, then you do all the 2-slot combinations, etc. So, if you want a maximum of say 4 figures, you stop after you do the 4-slot combinations. – Tyler Durden Feb 15 '13 at 16:19
  • The High Performance guy, btw, has no idea what he is talking about. You will get pretty reasonable performance by my method up to 10,000 item lists, and possibly up to 100,000. – Tyler Durden Feb 15 '13 at 16:23