Converge on Best Combination of Elements

Question

You have $10,000 to invest in stocks. You are given a list of 200 stocks, and are told to select 8 of those stocks to buy, and also indicate how many of those stocks you want to buy. You cannot spend more than $2,500 on a single stock alone, and each stock has its own price ranging from $100 to $1000. You cannot buy a fraction of a stock, only whole numbers. Each stock also has a value attached to it indicating how profitable it is. This is an arbitrary number from 0-100 that serves as a simple rating system.

The end goal is to list the optimal selection of 8 stocks, and indicate the best quantity of each of those stocks to buy without going over the $2,500 limit for each stock.

• I'm not asking for investment advice, I chose stocks because it acts as a good metaphor for the actual problem I'm trying to solve.

• Seems like what I'm looking at is a more complex version of the 0/1 Knapsack problem: https://en.wikipedia.org/wiki/Knapsack_problem.

• No, this isn't homework.

Are you looking to better understand how to actually invest, or are you looking at how to maximize your arbitrary formula? If you want to actually invest, you want to maximize alpha while keeping beta at acceptable levels. See https://www.investopedia.com/ask/answers/102714/whats-difference-between-alpha-and-beta.asp for basic definitions of those terms. — btilly, Feb 21 '18 at 17:41
@btilly Not interested in investing, but used stocks as a self-contained metaphor for the actual problem I'm trying to solve. — Kael Eppcohen, Feb 21 '18 at 17:49
In that case, can you please offer the whole problem? What information do you start with, what formula are you trying to maximize? (I know that price is part of it, but that is explicit in your current description.) — btilly, Feb 21 '18 at 17:51
@btilly completely rewrote the problem to be more specific while containing less exposition. — Kael Eppcohen, Feb 21 '18 at 18:09
What are you trying to maximize? The sum of the profit of your portfolio? — btilly, Feb 21 '18 at 18:16
@btilly yeah the sum of the profit scores of each stock * the quantity of that stock you chose to buy. So if S is the stock score, and Q is the quantity, the sum of S*Q for each of the 8 stocks you selected is what you are optimizing. — Kael Eppcohen, Feb 21 '18 at 18:19

btilly · Accepted Answer · 2018-02-22T18:33:56.373

Here is lightly tested code for solving your problem exactly in time that is polynomial in the amount of money available, the number of stocks that you have, and the maximum amount of stock that you can buy.

#! /usr/bin/env python
from collections import namedtuple

Stock = namedtuple('Stock', ['id', 'price', 'profit'])

def optimize (stocks, money=10000, max_stocks=8, max_per_stock=2500):
    Investment = namedtuple('investment', ['profit', 'stock', 'quantity', 'previous_investment'])
    investment_transitions = []
    last_investments = {money: Investment(0, None, None, None)}
    for _ in range(max_stocks):
        next_investments = {}
        investment_transitions.append([last_investments, next_investments])
        last_investments = next_investments


    def prioritize(stock):
        # This puts the best profit/price, as a ratio, first.
        val = [-(stock.profit + 0.0)/stock.price, stock.price, stock.id]
        return val

    for stock in sorted(stocks, key=prioritize):
        # We reverse transitions so we have not yet added the stock to the
        # old investments when we add it to the new investments.
        for transition in reversed(investment_transitions):
            old_t = transition[0]
            new_t = transition[1]
            for avail, invest in old_t.iteritems():
                for i in range(int(min(avail, max_per_stock)/stock.price)):
                    quantity = i+1
                    new_avail = avail - quantity*stock.price
                    new_profit = invest.profit + quantity*stock.profit
                    if new_avail not in new_t or new_t[new_avail].profit < new_profit:
                        new_t[new_avail] = Investment(new_profit, stock, quantity, invest)
    best_investment = investment_transitions[0][0][money]
    for transition in investment_transitions:
        for invest in transition[1].values():
            if best_investment.profit < invest.profit:
                best_investment = invest

    purchase = {}
    while best_investment.stock is not None:
        purchase[best_investment.stock] = best_investment.quantity
        best_investment = best_investment.previous_investment

    return purchase


optimize([Stock('A', 100, 10), Stock('B', 1040, 160)])

And here it is with the tiny optimization of deleting investments once we see that continuing to add stocks to it cannot improve. This will probably run orders of magnitude faster than the old code with your data.

#! /usr/bin/env python
from collections import namedtuple

Stock = namedtuple('Stock', ['id', 'price', 'profit'])

def optimize (stocks, money=10000, max_stocks=8, max_per_stock=2500):
    Investment = namedtuple('investment', ['profit', 'stock', 'quantity', 'previous_investment'])
    investment_transitions = []
    last_investments = {money: Investment(0, None, None, None)}
    for _ in range(max_stocks):
        next_investments = {}
        investment_transitions.append([last_investments, next_investments])
        last_investments = next_investments


    def prioritize(stock):
        # This puts the best profit/price, as a ratio, first.
        val = [-(stock.profit + 0.0)/stock.price, stock.price, stock.id]
        return val

    best_investment = investment_transitions[0][0][money]
    for stock in sorted(stocks, key=prioritize):
        profit_ratio = (stock.profit + 0.0) / stock.price
        # We reverse transitions so we have not yet added the stock to the
        # old investments when we add it to the new investments.
        for transition in reversed(investment_transitions):
            old_t = transition[0]
            new_t = transition[1]
            for avail, invest in old_t.items():
                if avail * profit_ratio + invest.profit <= best_investment.profit:
                    # We cannot possibly improve with this or any other stock.
                    del old_t[avail]
                    continue
                for i in range(int(min(avail, max_per_stock)/stock.price)):
                    quantity = i+1
                    new_avail = avail - quantity*stock.price
                    new_profit = invest.profit + quantity*stock.profit
                    if new_avail not in new_t or new_t[new_avail].profit < new_profit:
                        new_invest = Investment(new_profit, stock, quantity, invest)
                        new_t[new_avail] = new_invest
                        if best_investment.profit < new_invest.profit:
                            best_investment = new_invest

    purchase = {}
    while best_investment.stock is not None:
        purchase[best_investment.stock] = best_investment.quantity
        best_investment = best_investment.previous_investment

    return purchase

This looks super promising! I'm gonna have to familiarize myself with Python a little bit so I can recreate it in Java. One thing I noticed is if you're running Python 3.0 or higher, you'll get an error. This is because Python 3.0 renamed iteritems() to items(). — Kael Eppcohen, Feb 22 '18 at 17:45
@KaelEppcohen There is a very important optimization missing. This will, with your data, have to potentially do hundreds of millions of operations. (10000 dollar values times 8 possible numbers of purchases times 25 times a stock can be purchased.) I'll add that and suddenly this will run a *lot* faster. — btilly, Feb 22 '18 at 18:23
The new optimization code works, but only if I make this adjustment: for avail, invest in list(old_t.items()): instead of the original for avail, invest in old_t.items(): — Kael Eppcohen, Feb 22 '18 at 18:46
@KaelEppcohen Ah, that makes sense. Shouldn't delete from a hash while iterating over it. :-) — btilly, Feb 22 '18 at 19:04
I'm completely new to Python and I'm having trouble understanding what "for avail, invest in old_t.items():" actually does. old_t contains the previous transition that happened right? And that transition contains two investments, the original and the one that we switched over to? So when you call "for avail, invest in old_t.items():" you're really just iterating twice, the first iteration is for the original investment, and the second is the one we switched to, correct? — Kael Eppcohen, Feb 22 '18 at 19:06
@KaelEppcohen No. A Python Dictionary is a Java HashMap. When you iterate with ForEach you get `` pairs. Same thing in Python. To be more specific, `avail` is the dollar amount left to invest and is the key. `invest` is effectively a LinkedList containing the details of the investment getting there. — btilly, Feb 22 '18 at 19:18
So after a lot of digging into the syntax of Python (and learning a lot along the way) I was able to recreate this function in Java. Thanks for giving such a great solution and for helping me understand it! — Kael Eppcohen, Feb 22 '18 at 20:58
@KaelEppcohen Out of curiosity, how quickly did it run for your data? — btilly, Feb 22 '18 at 21:01
in the Java version I recreated, I was able to do run 300 stocks with randomly generated prices and profits of 1-1000 in 5-15 seconds for both Java and Python versions. This is with 10k money, 2.5 stock cap, 8 investments max. When I scale up the variables involved, the time it takes exponentially increases. I'm gonna look into a way to multithread it but further optimizations/limitations/shortcuts may be necessary to make this scale to the upper bounds of what I'm planning to use it for. — Kael Eppcohen, Feb 22 '18 at 21:25
EDIT: seems like the time it takes is HIGHLY contextual, meaning the range in which your data operates influences strongly the time it takes to process. Additionally it seems that the algorithm may be biased towards selecting more expensive stocks, but further testing is required to confirm. — Kael Eppcohen, Feb 22 '18 at 21:28
@KaelEppcohen Yes, it is dependent on the data. That's why it is called "pseudopolynomial". If you round off your prices to the nearest, say, $50, you may see much better performance. — btilly, Feb 22 '18 at 21:38
@KaelEppcohen Here is a significant detail. This prioritizes the best priced stocks first. Therefore if you run it for, say, 1 second, then whatever it has gotten to is probably a pretty good solution. It is *possible* that something at the end could be an improvement. But that is fairly unlikely. — btilly, Feb 22 '18 at 22:54
oh wow you're not kidding about the rounding. I got a memory error in one case when I didn't round and after applying some relatively insignificant rounding it processed it in less than 1 second. That's fantastic. — Kael Eppcohen, Feb 23 '18 at 00:57

Converge on Best Combination of Elements

1 Answers1