Recursive CTE in PostgreSQL for knapsack problem

Question

I have a dataset with 3 columns:

Item_id	Sourced_from	Cost
1	Local	15
2	Local	10
3	Local	20
4	International	60

I am trying to write a query in PostgreSQL to fetch total of local and international items, customer can buy within the cash limit. For a cash limit 50, this is the output I am expecting:

Local	International
3	0

I have a pretty basic knowledge of PostgreSQL, and after googling it seems like this could be solved with recursive CTE, I am unable to figure out how should I select my source seed/anchor point in this scenario.

Any ideas, how should I approach this?

score 2 · Answer 1 · answered Jul 29 '21 at 11:19

Not with a recursive CTE, but still works:

DDL/DML:

create table T
(
    id   integer primary key generated by default AS IDENTITY,
    kind text    not null,
    cost integer not null
);

insert into T(kind, cost)
values ('local', 15),
       ('local', 10),
       ('local', 20),
       ('international', 60);

-- 4. This outer CTE and the following self-join is only necessary in order to display the rows that have a count() of 0
with sub as
         (
             -- 3. find the total cost of buying this row + all previous rows, grouped by its kind
             select X.kind, sum(X.cost) as cost, X.rn
             from (
                      with cte as (
                          -- 1. assign an increasing row number on each row from the table ordered by its cost
                          select *, row_number() over (order by T.cost asc, T.kind) as rn
                          from T
                      )
                      -- 2. self-join the CTE on each row with the same kind, but join it only with the rows that have a row number less than or equal to the current row number 
                      select A.id, A.kind, A.cost, B.rn
                      from cte as A
                               join cte as B on A.kind = B.kind and A.rn <= B.rn
                  ) as X
             group by X.kind, X.rn
         )

select M.kind, count(N.*)
from sub as M -- 5. count only the amount of goods that fit in out budget (i.e. 50)
         left outer join sub as N on M.rn = N.rn and N.cost <= 50
group by M.kind
;

Output (db-fiddle):

+-------------+-----+
|kind         |count|
+-------------+-----+
|local        |3    |
|international|0    |
+-------------+-----+

Ftisiot · Answer 2 · 2021-07-29T12:53:18.220

I made a CTE example to solve the problem:

Recreated your case with

create table kp (item_id int, sourced_from varchar, cost int);
insert into kp values (1,'local',15);
insert into kp values (2,'local',10);
insert into kp values (3,'local',20);
insert into kp values (4,'international',60);

The following query does:

Selects from kp only items with cost less than 50
adds the item_id in the list_of_items The recursive bit does:
joins with kp checking the source_from is the same and the kp.item_id is not already contained in the list_of_items (avoiding to put the same item multiple times)
computes the total cost (total_cost)
adds the new item item_id to the list_of_items

WITH RECURSIVE items (item_id, next_item_id, sourced_from, total_cost, nr_items, list_of_items) AS (
    SELECT 
        item_id, 
        item_id as next_item_id, 
        sourced_from, 
        cost as total_cost,
        1 as nr_items, 
        ARRAY[item_id] list_of_items
  from kp where cost < 50
  UNION ALL
    SELECT 
        kp.item_id, 
        items.item_id  as next_item_id, 
        items.sourced_from, 
        items.total_cost + kp.cost total_cost,
        items.nr_items + 1 as nr_items,
        items.list_of_items || kp.item_id as  list_of_items
    FROM kp join items 
        on items.sourced_from=kp.sourced_from
        and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
    WHERE kp.cost + items.total_cost < 50
)
SELECT * FROM items;

If you run against the above dataset you'll end up with the detailed result

item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       1 |            1 | local        |         15 |        1 | {1}
       2 |            2 | local        |         10 |        1 | {2}
       3 |            3 | local        |         20 |        1 | {3}
       1 |            2 | local        |         25 |        2 | {2,1}
       1 |            3 | local        |         35 |        2 | {3,1}
       2 |            1 | local        |         25 |        2 | {1,2}
       2 |            3 | local        |         30 |        2 | {3,2}
       3 |            1 | local        |         35 |        2 | {1,3}
       3 |            2 | local        |         30 |        2 | {2,3}
       1 |            2 | local        |         45 |        3 | {3,2,1}
       1 |            3 | local        |         45 |        3 | {2,3,1}
       2 |            1 | local        |         45 |        3 | {3,1,2}
       2 |            3 | local        |         45 |        3 | {1,3,2}
       3 |            1 | local        |         45 |        3 | {2,1,3}
       3 |            2 | local        |         45 |        3 | {1,2,3}
(15 rows)

which shows all the permutations of the 3 local items. Now if you substitute the last SELECT section with

SELECT * FROM items order by nr_items desc, total_cost desc, list_of_items asc limit 1;

You'll be able also to pick the combination having the max number of items, with the cost closest to the budget (I added also an ascending ordering based on list_of_items to receive always the same result in case of multiple combinations), which in the case above would result in

 item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       3 |            2 | local        |         45 |        3 | {1,2,3}
(1 row)

If you are just interested in the maximum by sourced_from then the last SELECT becomes

select sourced_from, max(nr_items) nr_items from items group by sourced_from;

with the expected result being

 sourced_from | nr_items 
--------------+----------
 local        |        3
(1 row)

Edit: to speed up the query and avoiding having multiple permutations of the same objects (e.g. {1,2,3} and {1,2,3}) we can force the next item_id to be greater of the current one. Full query

WITH RECURSIVE items (item_id, next_item_id, sourced_from, total_cost, nr_items, list_of_items) AS (
    SELECT 
        item_id, 
        item_id as next_item_id, 
        sourced_from, 
        cost as total_cost,
        1 as nr_items, 
        ARRAY[item_id] list_of_items
  from kp where cost < 50
  UNION ALL
    SELECT 
        kp.item_id, 
        items.item_id  as next_item_id, 
        items.sourced_from, 
        items.total_cost + kp.cost total_cost,
        items.nr_items + 1 as nr_items,
        items.list_of_items || kp.item_id as  list_of_items
    FROM kp join items 
        on items.sourced_from=kp.sourced_from
        and items.list_of_items::int[] @> ARRAY[kp.item_id] = false
        and items.item_id < kp.item_id
    WHERE kp.cost + items.total_cost < 50
)
select * from items;

result

 item_id | next_item_id | sourced_from | total_cost | nr_items | list_of_items 
---------+--------------+--------------+------------+----------+---------------
       1 |            1 | local        |         15 |        1 | {1}
       2 |            2 | local        |         10 |        1 | {2}
       3 |            3 | local        |         20 |        1 | {3}
       2 |            1 | local        |         25 |        2 | {1,2}
       3 |            1 | local        |         35 |        2 | {1,3}
       3 |            2 | local        |         30 |        2 | {2,3}
       3 |            2 | local        |         45 |        3 | {1,2,3}
(7 rows)

Recursive CTE in PostgreSQL for knapsack problem

2 Answers2