0

I have a set of DCG rules (in this case german personal pronouns):

% personal pronoun (person, case, number, genus)
ppers(1,0,sg,_) --> [ich].
ppers(1,1,sg,_) --> [meiner].
ppers(1,2,sg,_) --> [mir].
ppers(1,3,sg,_) --> [mich].
ppers(2,0,sg,_) --> [du].
ppers(2,1,sg,_) --> [deiner].
ppers(2,2,sg,_) --> [dir].
ppers(2,3,sg,_) --> [dich].
...

Because they are semantically connected, it would make sense to me to keep this information by moving them into a list (grouped by person for example) instead of unrelated rules. This also makes things a bit neater:

ppers(1,sg,_,[ich, meiner, mir, mich]).
ppers(2,sg,_,[du,deiner,dir,dich]).
...

I would then select the item I want with nth0() where the case I need is the index within the list.

However, I noticed when tracing through the program, that when checking a german sentence for correct grammar and trying to find if a part is a personal pronoun, Prolog will not step through every instanc when I use the upper version (plain rules), but will crawl through every list when I use the list version below.

Does this mean that performance will be worse if I use lists and nth0 versus plain rules? Or does the Prolog tracer just not show the crawling for plain rules as it does for lists?

(I hope I could make my question obvious enough, if not I will expand.)

false
  • 10,264
  • 13
  • 101
  • 209
magnattic
  • 12,638
  • 13
  • 62
  • 115

4 Answers4

2

Most probably the speed and tracing difference is not caused by indexing (*), but by the speed and tracing difference between clause head unification and body call nth. If you really want to take advantage of indexing and want to be portable (**) across most Prolog systems, you would need to reformulate your problem for first argument indexing.

One way to do this, is via an additional predicate. Supposed you have originally these DCG rules:

cat(Attr1, .., Attrn) --> [Terminal1, .., Terminaln].
..

Transform this into:

cat(X1, .., Xn) --> [Y], cat2(Y, X1, .., Xn).

cat2(Terminal1, Attr1, .., Attrn) --> [Terminal2, .., Terminaln].
..

When we apply this to your example we would get:

% personal pronoun (person, case, number, genus)
ppers(X1,X2,X3,X4) --> [Y], ppers2(Y,X1,X2,X3,X4).

% personal pronoun 2 (first word, person, case, number, genus)
ppers2(ich,1,0,sg,_) --> [].
ppers2(meiner,1,1,sg,_) --> [].
ppers2(mir,1,2,sg,_) --> [].
ppers2(mich,1,3,sg,_) --> [].
ppers2(du,2,0,sg,_) --> [].
ppers2(deiner,2,1,sg,_) --> [].
ppers2(dir,2,2,sg,_) --> [].
ppers2(dich,2,3,sg,_) --> [].

You can do this for each category you have in your code and that is kind of a lexicon table. The above works independent on how DCGs are translated and if first argument indexing is present, it will be lightning fast.

Bye

(*) Even if your Prolog system can do multi argument indexing, it might still not do complex term indexing. To index a [ich|X] the Prolog system would need to decend into the list, but most probably it does not decend and does only index (.)/2, so that all clauses look the same and indexing has no positive effect.

(**) I guess the only common denominator among Prolog systems what concerns indexing is first argument indexing. Besides that not all Prolog systems may put a terminal into the head. Some might use =/2 in the body and some might use 'C'/3 in the body. DCGs are currently not standardized what concerns the modelling of terminals.

1

In general the tracer will show you what actually happens, so yes, if it iterates where the alternative formulation directly accesses the target term via matching, then that will also happen when you're not looking. But to find out whether that actually means worse performance, you have to measure and compare both alternatives in a real scenario. The unification might be slow even though it's not shown as a separate step by the tracer, or your run-time system might make optimizations or even compile stuff that doesn't happen under trace. Or it might be slower but not enough to worry about. Here, as always, the golden rule is: measure, then optimize.

Kilian Foth
  • 13,904
  • 5
  • 39
  • 57
1

Why are you using nth0? Maybe could be the performance killer culprit, use memberchk instead.

Apart this I think your intuition about performances has a well founded background in 'argument indexing'. DCG are usually translated in Prolog (I'm using SWI-Prolog here):

ppers(1,0,sg,_) --> [ich].

becomes

ppers(1, 0, sg, _, [ich|A], A).

A recent optimization on SWI-Prolog virtual engine, inspired (I think) from YAP, automatically builds all the indexes for predicates having sufficiently bound arguments.

Thus you can expect that parsing (using SWI-Prolog) with your first scheme will be more efficient.

Previously, just 'first argument indexing' was implemented, in that case (or if you are using a Prolog without indexing capabilities) you should find very similar timings between these schemes.

HTH

CapelliC
  • 59,646
  • 5
  • 47
  • 90
  • I use nth0 so I can select a certain case (e.g. first case, first person, singular = "Ich" = "I" in English). I need this for various grammar reasons. If I only use memberchk, I won't be able to distinguish the cases of the pronouns. – magnattic May 21 '12 at 11:37
  • Thanks for the explanation, I'm sorry. I should have read your code more carefully... – CapelliC May 21 '12 at 12:34
0

Grammar rules are compiled into predicate clauses, which are usually indexed. Most, if not all, Prolog compilers use first-argument indexing (by default) to avoid trying clauses that will never be part of the proof tree when proving a goal. Thus, depending on your call patterns, and as you observed using your Prolog compiler tracing support, will not step into every predicate clause. Moreover, calling the nth0/3 predicate with an instantiated index still requires a linear traversal of the list until the specified index is reached. Same if, as others suggested, if you used the memberchk/2 predicate. A list is list.

Paulo Moura
  • 18,373
  • 3
  • 23
  • 33