1

I'm endeavouring to understand the following Prolog code:

most_probable_hmm_path(Words,Path) :-
  probable_paths(Words,[1-[start]],PPaths),
  keymax(PPaths,_P-Path1),
  reverse(Path1,[start|Path]).

probable_paths([],PPaths,PPaths).
probable_paths([Word|Words],PPaths0,PPaths) :-
  findall(PPath,
      (outprob(Word,Tag2,PL),
      findall(P2-[Tag2,Tag1|Tags],
          (member(P1-[Tag1|Tags],PPaths0),
          transprob(Tag1,Tag2,PT), 
          P2 is PL*PT*P1),
      AllPaths),
      keymax(AllPaths,PPath)),
  PPaths1),
  probable_paths(Words,PPaths1,PPaths).

keymax(AllPaths,U-V) :-
    aggregate(max(N,P), member(N-P,AllPaths), max(U,V)).

It is an implementation of the Viterbi algorithm.

I want to understand how the data structures within the code at various locations are populated, what they look like. Is it possible to intersperse the equivalent of 'print' statements within the algorithm so I can see what is going on at each step? I've often done this when coding in Java or Python and I find it's a more or less useful mechanism to 'grok' the guts of a program.

In case you're interested I've been using it with the following probabilities:

outprob(a,det,0.300).
outprob(can,aux,0.010).
outprob(can,v,0.005).
outprob(can,n,0.001).
outprob(he,pron,0.070).

transprob(start,det,0.30).          transprob(v,det,0.36).
transprob(start,aux,0.20).          transprob(v,aux,0.01).
transprob(start,v,0.10).            transprob(v,v,0.01).
transprob(start,n,0.10).            transprob(v,n,0.26).
transprob(start,pron,0.30).         transprob(v,pron,0.36).
transprob(det,det,0.20).            transprob(n,det,0.01).
transprob(det,aux,0.01).            transprob(n,aux,0.25).
transprob(det,v,0.01).              transprob(n,v,0.39).
transprob(det,n,0.77).              transprob(n,n,0.34).
transprob(det,pron,0.01).           transprob(n,pron,0.01).
transprob(aux,det,0.18).            transprob(pron,det,0.01).
transprob(aux,aux,0.10).            transprob(pron,aux,0.45).
transprob(aux,v,0.50).              transprob(pron,v,0.52).
transprob(aux,n,0.01).              transprob(pron,n,0.01).
transprob(aux,pron,0.21).           transprob(pron,pron,0.01).

And checking the results like so:

?- most_probable_hmm_path([he,can,can,a,can],Sequence).
Sequence = [pron, aux, v, det, n].
  • 2
    Using prints to see what's going on in your program is a bad idea in logic programming. That is mainly because Prolog backtracks to previously created choice points and might go several times through the same print. – Tudor Berariu Jan 14 '15 at 07:25
  • 2
    When implementing Viterbi, always use log probabilities (that you would sum) instead of probabilities that might get very, very small when multiplied. – Tudor Berariu Jan 14 '15 at 07:27
  • @TudorBerariu thank you for that insight, I'd not even considered it. I'm almost a complete novice when it comes to Prolog. Could you suggest some other way I might deconstruct this program to understand it's components? –  Jan 14 '15 at 08:53
  • I recommend you simplifying the facts a bit: keep only two or three hidden states and their corresponding transition and emission probability distributions. Then you can use `trace/0` to see exactly how Prolog satisfies your goal. – Tudor Berariu Jan 14 '15 at 09:13
  • @TudorBerariu that's definitely a good idea. I was thinking that I could also formulate some other queries to the Prolog engine that could tell me some other information, for instance the one I included in the original question tells me the sequence, though I guess I can also ask Prolog to tell me the sequence and the exact number probability of that sequence, which I think is saved as the key of that list index, isn't it? Do you know how best to formulate such a query? Or maybe you have an idea of what else might be a useful query to pose to Prolog. –  Jan 14 '15 at 10:43
  • Maybe not ideal, but SWI-Prolog debugger has the ability to show data structures while stepping in the code. – CapelliC Jan 14 '15 at 13:16

0 Answers0