How to parse parse directory path containing whitespaces and escaped symbols using treetop?

Question

I need to parse some strings which contain paths to directories. The problem is that the contains escaped whitespaces and other escaped symbols. For example:

"/dir_1/dir_2/dir_3/dir/another/dest_dir\ P\&G/"

Note that there is a whitespace before P\&G/.

Here is my treetop grammar(alpha_digit_special contains whitespace in the beginning)

rule alpha_digit_special
  [ a-zA-Z0-9.+&\\]
end

rule path_without_quotes
  ([/] alpha_digit_special*)+ 
end

rule quot_mark
  ["]
end

rule path_with_quotes
  quot_mark path_without_quotes quot_mark
end

rule path
  path_with_quotes / path_without_quotes
end

I get nil after parsing this string. So how can i specify the rule so that the string may contain escaped whitespaces?

Kind of late but... What are you trying to parse the paths into? Are you trying to split them based on `'/'`? What is the final result you'd like to have? — Josh Voigts, Aug 24 '12 at 19:26

score 1 · Answer 1 · answered Nov 18 '14 at 23:49

You cannot use alpha_digit_special* to handle back-slash escaped spaces. Instead, you must use a repetition of character units, where a character unit is either a backslashed character pair, or a single non-backslash character. Something like this should work:

rule alpha_digit_special
  [a-zA-Z0-9.+&\\]
end

rule path_character
  '\\' (alpha_digit_special / ' ')
  /
  alpha_digit_special
end

rule path_without_quotes
  ([/] path_character* )+ 
end

Note that the above won't accept a backslashed character (that's not a space nor in the alpha_digit_special set). I think you can see how to change that though.

deadrunk · Answer 2 · 2012-04-09T11:41:57.827

0

Did you try \s?

test = "dest_dir P&G" 
test.match(/[a-zA-Z0-9_\s\&]+/)
 => #<MatchData "dest_dir P&G">

edited Apr 09 '12 at 11:41

answered Apr 09 '12 at 11:31

deadrunk

13,861
4
29
29

Yes. I've already tried this. The problem is that syntax nodes are not found correctly then. The rule captures extra whitespaces – roman Apr 09 '12 at 14:43

How to parse parse directory path containing whitespaces and escaped symbols using treetop?

2 Answers2