4

I have a directed weighted graph G=(V,E).

In this graph the weight of edge(v[i],v[j]) is the count of transition between v[i] and v[j].

I am trying to determine the best way to accomplish task: how to find the probability P of path from node A to node B

All weights are positive integers.

For example,

weight(A,B)=count of transition from A to B
weight(B,C)=count of transition from B to C
weight(B,D)=count of transition from B to D
weight(D,C)=count of transition from D to C

And we have several paths:

A->B->C 
A->B->D->C

So, how to calculate probability P of these paths and choose the best one?

Matsmath
  • 1,164
  • 2
  • 21
  • 40
Andrei
  • 1,313
  • 4
  • 18
  • 35
  • 1
    It's not clear where probabilities are supposed to come from here. What do you mean by "count of transition"? And what makes one path the best? You haven't given us a clear enough problem statement for us to answer the question. – user2357112 May 04 '16 at 17:32
  • @user2357112 For example, the count of transition - the number of transitions from A to the place B . The best path is a path with the most probability. My attempt to solve this: P=the sum of weights of the edges belonging to this path. But I have a problem that the path with the greatest length will have the greatest probability, but It is wrong – Andrei May 04 '16 at 17:36

1 Answers1

4

It can be solved by reducing the problem to shortest path problem, assuming we are indeed discussing probabilities (that means, each weight is in range [0,1]).

Let the graph be G=(V,E), and the probability between two vertices denoted as w(u,v).

Define: w'(u,v) = -log(w(u,v))

The shortest path from some node s to some node t is then the shortest path in the graph when using w' as weight function. You can find the shortest path using Dijkstra's algorithm or Bellman-Ford algorithm

Proof:

For a path v1->v2->...->vn, its probability is w(v1,v2) * w(v2,v3) * ... * w(vn-1,vn).

When using w' as the weight, the cost of this path when using any shortest path algorithm is:

d(v1,vn) = w'(v1,v2) + w'(v2,v3) + ... + w'(vn-1,vn) = 
d(v1,vn) = -log(w(v1,v2)) + -log(w(v2,v3) + ... + -log(w(vn-1,vn)) = 
d(v1,vn) = -1* [ log(w(v1,v2)) + log(w(v2,v3)) + ... + log(w(vn-1,vn)) =
d(v1,vn) = -1* log(w(v1,v2) * w(v2,v3) * ... * w(vn-1,vn)) 

This obviously applies also for the minimal path found from s to t.
This means that this path have minimal value of:

s(s,t) = -1* log(w(s,v2) * w(v2,v3) * ... * w(vn-1,t)) 

And since log is monotonic function, it is also minimal value of -1 * w(s,v2) * w(v2,v3) * ... * w(vn-1,t), which is the maximal value of w(s,v2) * w(v2,v3) * ... * w(vn-1,t), and that's exactly the probability.

amit
  • 175,853
  • 27
  • 231
  • 333
  • 1
    This looks quite sophisticated. I suggest you to use array notation v[i] instead of vi (and actually instead of v_i what is in the OP). I suggest to add a reference to the "logarithmic"-trick. I suggest to remove the overexplanatory lines because it makes the answer look *more difficult than it actually is*. Also, I guess you assume some kind of independence, which should be made explicit. – Matsmath May 04 '16 at 18:25
  • @amit w'(u,v) value in range([0;1]) ? If I have w(u,v) = count of transitions from node u to node v, how to get w'(u,v)? – Andrei May 23 '16 at 18:05
  • @Andrei `-log(count of transitions from node u to node v)`, as it states for general `w(.,.)`. – amit May 23 '16 at 21:03
  • @amit I have a misunderstanding, how now the weight will be in [0;1] ? – Andrei May 24 '16 at 06:53
  • @Andrei Apologies, 3 weeks passed since the answer was given an your question. You are right, in this case you need to normalize it somehow to be in `[0,1]`. One way to do so will be to define `w(u,v) = #transitions from u to v / #transition from u to any node`, or any other normalization you can think of. Without having [0,1] range, this is a significantly harder problem and if I remember correctly is equivalent to Longest Path Problem (which is NP-Complete). – amit May 24 '16 at 16:51