I am considering trying to optimize money generation ("farming") in the tower defense game Bloons TD 6. For simplicity, only consider a small set of farm types, which cost a certain amount of money to buy and produce a set amount of money per round (actually the farms produce money during the round, but ignore this). It is also possible to sell farms for a fixed percentage of their purchase cost (say 75%), and the ultimate goal is to accumulate as much money as possible after a fixed number of rounds (say 40).
If it helps, here are some example numbers:
- Merchantman: costs $3000, produces $200 each round
- Favored Trades: costs $5500 to upgrade from Merchantman, produces $500 per round
- Trade Empire: limit 1, costs $23000 to upgrade from Favored Trades, produces $800 each round, has a complex interaction which increases income of existing Merchantman and Favored Trades depending on how many there are
My first thought was implementing as dynamic programming, but there is no easy way to break this down into subproblems. The state can be represented as (round, money, farms), but these may take on exponentially many (integer) values, and it's also not clear what to maximize: greedily maximizing only money per round would never result in any investment.
A natural way to view this problem is as a tree search, where each node represents game state per round and each edge represents possible actions taken between rounds (multiple actions like buys/sells are possible). Another formulation is to have each edge be one action such as "buy farm" or "advance round".
A tree search better than exhaustive search must have some kind of heuristic guiding it, so I came up with the heuristic of money + total sell value + total farm income from now to the end. This represents the ending money if you did nothing further until the end, then sold all farms.
Further, I came up with potential search strategies:
Greedy search with limited depth: Search exhaustively as deep as feasible, say 5 rounds, move to the solution with best heuristic value, then repeat. I'm not sure if this algorithm has a name, but it is like chess engines who can only see 10 moves deep and then have to make a move (however, chess engines employ alpha-beta pruning to prune many branches)
Branch-and-bound: Maintain a queue of candidate solutions, and branch (advance one round and try all actions) if the heuristic value is at least a fraction of the best known heuristic and is considered good enough. The exact implementation details are unclear to me.
Are these realistic approaches? What are algorithms used to solve these kinds of optimization over discrete steps problems?