Consider a tree of depth B
(i.e.: all the paths have length B
) whose nodes represent system states and edges represent actions.
Each action a in ActionSet
has a gain and makes the system move from a state to another.
Performing the sequence of actions A-B-C
or C-B-A
(or any other permutation of these actions) brings to the same gain. Moreover:
- the higher the number of actions performed before
a
, the lower the increase of total gain whena
is asked - the gain achieved by each path cannot be greater than a quantity
H
, i.e.: some paths may achieve a gain that is lower thanH
, but whenever performing an action makes the total gain equal toH
, all the other actions performed from that point on will gain0
- what is gained by the sequence of actions
#b,h,j, ..., a#
isg(a)
(0 <= g(a) <= H
) - once an action has been performed on a path from the root to a leaf, it cannot be performed a second time on the same path
Application of Algorithm1. I apply the following algorithm (A*-like):
- Start from the root.
- Expand the first level of the tree, which will contain all the actions in
ActionSet
. Each expanded actiona
has gainf(a) = g(a) + h(a)
, whereg(a)
is defined as stated before andh(a)
is an estimate of what will be earned by performing otherB-1
actions - Select the action
a*
that maximizesf(a)
- Expand the children of
a*
- Iterate 2-3 until an entire path of
B
actions from the root to a leaf that guarantees the highestf(n)
is visited. Notice that the new selected action can be selected also from the nodes which were abandoned at previous levels. E.g., if after expandinga*
the node maximizingf(a)
is a children of the root, it is selected as the new best node
Application of Algorithm2. Now, suppose I have a greedy algorithm that looks only to the g(n)
component of the knowledge-plus-heuristic function f(n)
, i.e., this algorithm chooses actions according to the gain that has been already earned:
- at the first step I choose the action
a
maximizing the gaing(a)
- at the second step I choose the action
b
maximizing the gaing(b)
Claim. Experimental proofs showed me that the two algorithms bring to the same result, which might be mixed (e.g., the first one suggests the sequence A-B-C
and the second one suggests B-C-A
).
However, I didn't succeed in understanding why.
My question is: is there a formal way of proving that the two algorithms return the same result, although mixed in some cases?
Thank you.