0

I need to parse some strings which contain paths to directories. The problem is that the contains escaped whitespaces and other escaped symbols. For example:

"/dir_1/dir_2/dir_3/dir/another/dest_dir\ P\&G/"

Note that there is a whitespace before P\&G/.

Here is my treetop grammar(alpha_digit_special contains whitespace in the beginning)

rule alpha_digit_special
  [ a-zA-Z0-9.+&\\]
end

rule path_without_quotes
  ([/] alpha_digit_special*)+ 
end

rule quot_mark
  ["]
end

rule path_with_quotes
  quot_mark path_without_quotes quot_mark
end

rule path
  path_with_quotes / path_without_quotes
end

I get nil after parsing this string. So how can i specify the rule so that the string may contain escaped whitespaces?

roman
  • 5,100
  • 14
  • 44
  • 77
  • Kind of late but... What are you trying to parse the paths into? Are you trying to split them based on `'/'`? What is the final result you'd like to have? – Josh Voigts Aug 24 '12 at 19:26

2 Answers2

1

You cannot use alpha_digit_special* to handle back-slash escaped spaces. Instead, you must use a repetition of character units, where a character unit is either a backslashed character pair, or a single non-backslash character. Something like this should work:

rule alpha_digit_special
  [a-zA-Z0-9.+&\\]
end

rule path_character
  '\\' (alpha_digit_special / ' ')
  /
  alpha_digit_special
end

rule path_without_quotes
  ([/] path_character* )+ 
end

Note that the above won't accept a backslashed character (that's not a space nor in the alpha_digit_special set). I think you can see how to change that though.

cliffordheath
  • 2,536
  • 15
  • 16
0

Did you try \s?

test = "dest_dir P&G" 
test.match(/[a-zA-Z0-9_\s\&]+/)
 => #<MatchData "dest_dir P&G">
deadrunk
  • 13,861
  • 4
  • 29
  • 29
  • Yes. I've already tried this. The problem is that syntax nodes are not found correctly then. The rule captures extra whitespaces – roman Apr 09 '12 at 14:43