1

I have two table: 1) One of them is for invoices and there are thousands of data. In my INVOICES table, there are invoices and their prices for customers. 2) The other one is for debts. In my DEBTS table, there are total debts of the invoices for every customer.
My target is to find the closest sums and the invoices of the debts. For instance, i have the tables:

DEBTS table:

    CUSTOMER_ID         TOTAL_DEBTS
      3326660                444$      
      2789514                165$     
      4931541                121$ 

INVOICES table:

CUSTOMER_ID       INVOICE_ID        AMOUNT_OF_INVOICE
  3326660              1a                   157$ 
  3326660              1b                   112$ 
  3326660              1c                   10$ 
  3326660              1d                   94$ 
  3326660              1e                   47$ 
  3326660              1f                   35$ 
  3326660              1g                   14$ 
  3326660              1h                   132$ 
  3326660              1i                   8$ 
  3326660              1j                   60$ 
  3326660              1k                   42$ 
  2789514              2a                   86$ 
  2789514              2b                   81$
  2789514              2c                   99$
  2789514              2d                   61$
  2789514              2e                   16$
  2789514              2f                   83$
  4931541              3a                   11$
  4931541              3b                   14$
  4931541              3c                   17$
  4931541              3d                   121$
  4931541              3e                   35$
  4931541              3f                   29$

My target table is:

CUSTOMER_ID        TOTAL_DEBTS     CALCULATED_AMOUNT        INVOICES_ID   
  3326660              444$              444$              1a,1b,1f,1h,1i    
  2789514              165$              164$                   2b,2f
  4931541              121$              121$                    3d

because there are thousands of data in my tables performance is very important for me. I find the code from stackoverflow: closest subset sum

However, the performance is low. I must stop the addition loop when i find the same values between calculeted_amount and total_debts.

Thank you for your help.

Community
  • 1
  • 1
Abdullah
  • 15
  • 3
  • 1
    This is not an appropriate problem for SQL. You would need to general all possible combinations and see which is closest. – Gordon Linoff Apr 11 '17 at 11:06
  • Agree with @GordonLinoff, read this, it helps you to understand what is your question about [wiki](https://en.wikipedia.org/wiki/Subset_sum_problem) – Seyran Apr 11 '17 at 11:37

1 Answers1

2

Use recursive query:

demo

with 
    t1 as ( 
        select customer_id cid, total_debts dbt, invoice_id iid, amount_of_invoice amt, 
               row_number() over (partition by customer_id order by invoice_id) rn
          from debts d join invoices i using (customer_id) ),
    t2 (cid, iid, ams, dbt, amt, sma, rn) as ( 
        select cid, cast(iid as varchar2(4000)), cast(amt as varchar2(4000)), 
               dbt, amt, amt, rn
          from t1 
        union all 
        select t2.cid, 
               t2.iid || ', ' || t1.iid,
               t2.ams || ', ' || t1.amt,
               t2.dbt, t2.amt, t1.amt + t2.sma, t1.rn
          from t2 
          join t1 on t1.cid = t2.cid and t1.rn > t2.rn and t2.sma + t1.amt <= t1.dbt),
    t3 as (
        select t2.*, rank() over (partition by cid order by dbt - sma ) rnk
          from t2)
select cid, iid, ams, dbt, sma from t3 where rnk = 1

Output:

    CID  IID                           AMS                             DBT      SMA        
-------  ----------------------------  ------------------------------  -------- -------- 
2789514  2b, 2f                        81, 83                               165      164 
3326660  1a, 1d, 1e, 1g, 1h            157, 94, 47, 14, 132                 444      444 
3326660  1b, 1c, 1d, 1e, 1f, 1g, 1h    112, 10, 94, 47, 35, 14, 132         444      444 
3326660  1a, 1c, 1f, 1h, 1i, 1j, 1k    157, 10, 35, 132, 8, 60, 42          444      444 
3326660  1a, 1b, 1f, 1h, 1i            157, 112, 35, 132, 8                 444      444 
4931541  3d                            121                                  121      121 

6 rows selected

Subquery T1 joins two tables and adds column rn used next to combine data. T2 is hierarchical, it does main part of work - combines all data until sum reaches debt. T3 filters best solutions with function rank. As You can see for CID 3326660 there are four best possible combinations.

For large amounts of data recursive subqueries are slow and this solution won't work, be warned.

Ponder Stibbons
  • 14,723
  • 2
  • 21
  • 24