1

I am trying to parse the following format: (identifier/)?identifier(/keyword)?, with the first identifier as well as the keyword optional. A keyword may not be used as an identifier. For example, if up is a keyword, then:

  • simple matches the second identifier,
  • first/second matches first as the first identifier, and second as the second one,
  • second/up matches second as a second identifier and up as a keyword.

Using Ragel with Ruby, I have defined the following FSM:

%%{
  machine simple;

  keyword = "up";
  separator = '/';
  ident_char = any - separator;
  identifier = ident_char+ - keyword;

  action start_string { $start_string = p }

  action first_string { puts "First: #{get_string(data, p)}" }
  action second_string { puts "Second: #{get_string(data, p)}" }

  action keyword_string { puts "Keyword: #{get_string(data, p)}" }

  main := ( identifier >start_string %first_string separator )? 
         :> identifier >start_string %second_string 
          ( separator keyword >start_string %keyword_string )?
  ;

}%%

%% write data;

def get_string(data, p)
  data[$start_string...p].pack("c*")
end

def parse(data)
  data = data.unpack("c*")
  eof = pe = data.length

  %% write init;
  %% write exec;
end


parse("first/second")
puts("---")
parse("second/up")

This gives the following output:

$ ragel -R simple.rl ; ruby simple.rb
Second: first
---
Second: second
Keyword: up

which is incorrect, as the first part should be First: first Second: second, but expected due to the :> priority I have given.

I have tried different combination of priorities, but haven't been able to get the expected result. Is there a way of solving this problem with Ragel (i.e. can this be solved without lookahead)?

Sébastien Le Callonnec
  • 26,254
  • 8
  • 67
  • 80

1 Answers1

0

Try this as your main machine:

two_idents = identifier >start_first %first_string . separator . (identifier >start_second %second_string);                             

main := (two_idents | identifier >start_first %first_string) . ( separator . keyword )?;

The trouble is that the "first identifier" shares a prefix with the "second identifier", so trying to do a guarded concatenation shortcuts the first machine. The union actually describes the match you're trying to do.

Judson
  • 744
  • 6
  • 12