0

Consider a tree of depth B (i.e.: all the paths have length B) whose nodes represent system states and edges represent actions.

Each action a in ActionSet has a gain and makes the system move from a state to another. Performing the sequence of actions A-B-C or C-B-A (or any other permutation of these actions) brings to the same gain. Moreover:

  • the higher the number of actions performed before a, the lower the increase of total gain when a is asked
  • the gain achieved by each path cannot be greater than a quantity H, i.e.: some paths may achieve a gain that is lower than H, but whenever performing an action makes the total gain equal to H, all the other actions performed from that point on will gain 0
  • what is gained by the sequence of actions #b,h,j, ..., a# is g(a) (0 <= g(a) <= H)
  • once an action has been performed on a path from the root to a leaf, it cannot be performed a second time on the same path

Application of Algorithm1. I apply the following algorithm (A*-like):

  1. Start from the root.
  2. Expand the first level of the tree, which will contain all the actions in ActionSet. Each expanded action a has gain f(a) = g(a) + h(a), where g(a) is defined as stated before and h(a) is an estimate of what will be earned by performing other B-1 actions
  3. Select the action a* that maximizes f(a)
  4. Expand the children of a*
  5. Iterate 2-3 until an entire path of B actions from the root to a leaf that guarantees the highest f(n) is visited. Notice that the new selected action can be selected also from the nodes which were abandoned at previous levels. E.g., if after expanding a* the node maximizing f(a) is a children of the root, it is selected as the new best node

Application of Algorithm2. Now, suppose I have a greedy algorithm that looks only to the g(n) component of the knowledge-plus-heuristic function f(n), i.e., this algorithm chooses actions according to the gain that has been already earned:

  1. at the first step I choose the action a maximizing the gain g(a)
  2. at the second step I choose the action b maximizing the gain g(b)

Claim. Experimental proofs showed me that the two algorithms bring to the same result, which might be mixed (e.g., the first one suggests the sequence A-B-C and the second one suggests B-C-A). However, I didn't succeed in understanding why.

My question is: is there a formal way of proving that the two algorithms return the same result, although mixed in some cases?

Thank you.

Eleanore
  • 1,750
  • 3
  • 16
  • 33
  • How do you decide which node to expand next? – BlueRaja - Danny Pflughoeft Jun 27 '13 at 18:26
  • In case of A* I built the knowledge-plus-heuristic classical function (f(n) = g(n) + h(n)), and thus the next state to be visited is the one which in best-first has the lowest cost. In the greedy algorithm, as I mentioned in the text, if N actions are available, and I have already chosen M actions, I choose as next action the action a that minimizes the cost C(a|A_1, ..., A_M). – Eleanore Jun 28 '13 at 07:09
  • What you are describing is basically [Dijkstra's Algorithm](http://en.wikipedia.org/wiki/Dijkstra%27s_algorithm), though your description of it is very awkward (eg. you would not "choose the action `B` minimizing the cost `C(B|A)`" for a specific `B`, you'd find the lowest among all actions in the priority queue). – BlueRaja - Danny Pflughoeft Jun 28 '13 at 07:16
  • Dijkstra's algorithm reckons on the possibility of starting the exploration of a path and then leaving it if it is not the lowest one. I have a limited exploration for which if I choose a node, it goes in the solution. E.g.: I explore the nodes at depth 1, then select one node at depth 1 and explore its children at depth 2... Thus, if I am exploring at depth N, I will not go to depth N-1 anymore (which, if I remember well, is possible in the vanilla Dijkstra algorithm). Is it still the Dijkstra algorithm? – Eleanore Jun 28 '13 at 07:32

1 Answers1

0

A* search will return the optimal path. From what I understand of the problem, your greedy search is simply performing bayes calculations and wlll continue to do so until it finds an optimal set of nodes to take. Since the order of the nodes do not matter, the two should return the same set of nodes, albiet in different orders.

I think this is correct assuming you have the same set of actions you can perform from every node.

Eric Kim
  • 41
  • 5