I am trying to parse the following format: (identifier/)?identifier(/keyword)?
, with the first identifier as well as the keyword optional. A keyword may not be used as an identifier. For example, if up
is a keyword, then:
simple
matches the second identifier,first/second
matchesfirst
as the first identifier, andsecond
as the second one,second/up
matchessecond
as a second identifier andup
as a keyword.
Using Ragel with Ruby, I have defined the following FSM:
%%{
machine simple;
keyword = "up";
separator = '/';
ident_char = any - separator;
identifier = ident_char+ - keyword;
action start_string { $start_string = p }
action first_string { puts "First: #{get_string(data, p)}" }
action second_string { puts "Second: #{get_string(data, p)}" }
action keyword_string { puts "Keyword: #{get_string(data, p)}" }
main := ( identifier >start_string %first_string separator )?
:> identifier >start_string %second_string
( separator keyword >start_string %keyword_string )?
;
}%%
%% write data;
def get_string(data, p)
data[$start_string...p].pack("c*")
end
def parse(data)
data = data.unpack("c*")
eof = pe = data.length
%% write init;
%% write exec;
end
parse("first/second")
puts("---")
parse("second/up")
This gives the following output:
$ ragel -R simple.rl ; ruby simple.rb
Second: first
---
Second: second
Keyword: up
which is incorrect, as the first part should be First: first
Second: second
, but expected due to the :>
priority I have given.
I have tried different combination of priorities, but haven't been able to get the expected result. Is there a way of solving this problem with Ragel (i.e. can this be solved without lookahead)?