2

I do not understand what Ragel considers a "final" state. IIRC the User's Guide says that states that are final before machine simplification remain final thereafter. When exactly is a state final, and how does one recognize this?


Application:

I'm using the state machine syntax to implement a string finder -- find ASCII strings with length greater than n, and print them. This means implementing a maximum length matcher, as below.

Despite the fact that the dot output shows no final states, the EOF transitions behave differently depending on which flavor of {$%@}eof is used. I do not understand why this should be. For example, in the has_string state below, using %eof instead of @eof causes both the commit_nonstring_eof and commit_string_eof actions to be called from one of the generated/synthetic states terminating the matching state.

Here is a working state machine. Note that the far right hand node exit action has exactly one action: commit_nonstring_eof. Working state machine

Here is a broken state machine. Note that the far right hand node exit action calls both commit_string_eof and commit_nonstring_eof Bad state machine

action commit_string { }
action commit_string_eof { }
action commit_nonstring_eof { }
action set_mark { }

action reset {
   /* Force the machine back into state 1. This happens after
    * an incomplete match when some graphical characters are
    * consumed, but not enough for use to keep the string. */
    fgoto start;
 }

 # Matching classes union to 0x00 .. 0xFF
 graphic = (0x09 | 0x20 .. 0x7E);
 non_graphic =  (0x00 .. 0x08 | 0x0A .. 0x1F | 0x7F .. 0xFF);

 collector = (

 start: (
     # Set the mark if we have a graphic character,
     # otherwise go to non_graphic state and consume input
     graphic @set_mark -> has_glyph |
     non_graphic -> no_glyph
 ) $eof(commit_nonstring_eof),

 no_glyph: (
     # Consume input until a graphic character is encountered
     non_graphic -> no_glyph |
     graphic @set_mark -> has_glyph
 ) $eof(commit_nonstring_eof),

 has_glyph: (
      # We already matched one graphic character to get here
      # from start or no_glyph. Try to match N-1 before allowing
      # the string to be committed. If we don't get to N-1,
      # drop back to the start state
      graphic{3} $lerr(reset) -> has_string
  ) @eof(commit_nonstring_eof),

  has_string: (
      # Already consumed our quota of N graphic characters;
      # consume input until we run out of graphic characters
      # then reset the machine. All exiting edges should commit
      # the string. We differentiate between exiting on a non-graphic
      # input that shouldn't be added to the string and exiting
      # on a (graphic) EOF that should be added.
      graphic* non_graphic -> start
  ) %from(commit_string) @eof(commit_string_eof) // okay
 #) %from(commit_string) %eof(commit_string_eof) // bad

); #$debug;

main := (collector)+;
gibbss
  • 2,013
  • 1
  • 15
  • 22

1 Answers1

1

I think that for the concatenated machines a.b the final state occurs for a when there is a transition from a to the first character of b. (cf. "Regular Language Operators / Concatenation" in the manual).

Despite the fact that the dot output shows no final states

Dot output shows a lot of final states. Transition from 7 to 6 makes 7 final, for example. Transition from 6 to 1 makes 6 final, and so on.

ArtemGr
  • 11,684
  • 3
  • 52
  • 85
  • Usually, the Dot output shows a doubled circle for final states. I guess the lack of that is what was confusing me. – gibbss Nov 04 '13 at 04:54
  • Regarding the dot output, the manual for Ragel isn't very helpful. All the graphs like this I had seen before, the double-circle indicated a final state. What exactly do the small circles & dots at the ends of arrows mean in the pics above? For that matter, in my graphs, many arrows are labeled like "0 / do_date, 69:6". I understand that the 0 indicates that the transition didn't consume a character of input, and the do_date is the action I assigned. But what does the '69:6' mean? – Jerry B Jan 31 '14 at 03:49
  • 69:6 are character codes. You have to pass the `-p` option to ragel to see the characters instead. Small circles and dots are presumably exits, but I can't say for sure without an example more familiar to me. – ArtemGr Feb 02 '14 at 13:04
  • 1
    Actually, from reading the Ragel source (the manual *really* needs work), the `69:6` indicates the line and column in the `.rl` file that determined the transition action, for actions that don't have a name (like `%{fhold}`). The little dots are called pseudo-states in the source, including entries to the graph, final states with EOF actions, and and states whose default actions got to error. The double circle *does* mark the final states of the overall machine. The final state of `a` in the example `a.b` generally ceases to be final during the concatenation. – Jerry B Feb 20 '14 at 13:01
  • Correction: The little dots are called pseudo-states in the source, including entries to the graph and final states with EOF actions. Small circles are pseudo-states for states whose default actions go to error. – Jerry B Feb 20 '14 at 13:08