0

I have a simple grammar, which takes 3 list items and runs a different dcg rule on each.

[debug]  ?- phrase(sentence(X), [sky, a, 1], []).
X = [bright, amber, on] .

Code:

sentence([A,C,R]) --> 
    analyse(A),
    colour(C),
    rating(R).

analyse(bright) --> [sky].
analyse(dark) --> [cave].

colour(red) --> [r]. 
colour(amber) --> [a]. 
colour(green) --> [g]. 

rating(on) --> [1].
rating(off) --> [0].

This works fine.

My problem is that my input list needs needs to have 2 items, not 3, and the second atom is a concat atom of colour and rating:

[sky, a1]

So somehow I have to (?) split this atom into [a, 1] and then the colour and rating rules will work with a simple dcg rule.

I can't work out how to do this..obviously with normal prolog, I'd just use atom_chars, but I can't work out how to interleave this with the grammar.

In a perfect world, it feels like I should not have to resort to using atom_chars, and I should be able to come up with a simple dcg rule to split it, but I'm not sure if this is possible, since we are parsing lists, not atoms.

false
  • 10,264
  • 13
  • 101
  • 209
magus
  • 1,347
  • 7
  • 13

2 Answers2

4

As you have said yourself, you just need to use a predicate like atom_chars/2. You can interleave normal code into a DCG rule by enclosing it in { and }.

But there is something fishy about your problem definition. As you have also said yourself, you are parsing a list, not an atom. The list you are parsing should be already properly tokenized, otherwise you cannot expect to define a DCG that can parse it. Or am I seeing this wrong?

So in other words: you take your input, split into single chars, tokenize that using a DCG. Depending on your input, you can do the parsing in the same step.

  • I agree - it should be properly tokenised, but to do that, I need another set of rules sitting on top of the ones above to identify the atom that needs splitting.. and the real life example is more complex than the example above / using natural language processing - the 'splitable atom' could appear anywhere in the sentence. I'm new to DCGs :-). I hope I understood your comment.. – magus Dec 27 '13 at 17:20
  • @magus so do you have a tokenizing step before you parse, or not? Ideally, after a proper tokenization there should be no _tokens_ left that cannot be parsed "atomically", without looking into them –  Dec 27 '13 at 17:34
  • Ok - I understand - no I don't. Looks like this is a parsing / tokenising design issue, and nothing to do with prolog. Not sure how the site admins wish to close this off. ie. not on topic ? Thanks for your help Boris. – magus Dec 27 '13 at 17:50
4

It was clear that a refined DCG rule could work, but, alas, it took too much time to me to craft a solution for your problem.

Here it is:

sentence([A,C,R]) --> 
    analyse(A),
    colour(C),
    rating(R).

analyse(bright) --> [sky].
analyse(dark) --> [cave].

colour(red) --> [r]. 
colour(amber) --> [a]. 
colour(green) --> [g]. 

colour(X), As --> [A], {
    atom_codes(A, Cs),
    maplist(char2atomic, Cs, L),
    phrase(colour(X), L, As)}.

rating(on) --> [1].
rating(off) --> [0].

char2atomic(C, A) :- code_type(C, digit) -> number_codes(A, [C]) ; atom_codes(A, [C]).

yields

?- phrase(sentence(X), [sky, a1], []).
X = [bright, amber, on] 

the key it's the use of 'pushback' (i.e. colour(X), As -->...). Here we split the unparsable input, consume a token, and push back the rest...

As usual, most of time was required to understand where my first attempt failed: I was coding char2atomic(C, A) :- atom_codes(A, [C])., but then rating//1 failed...

CapelliC
  • 59,646
  • 5
  • 47
  • 90
  • I never knew that pushback stuff was possible.. many thanks chac. – magus Dec 27 '13 at 18:38
  • You can add `Cs = [_,_|_]` to the final rule of `colour//1` so that it only applies to atoms of length greater than 1. This makes the predicate terminate for the given example and similar ones. – mat Dec 28 '13 at 15:33