I'm writing a parser for strings with interpolated name-value arguments, e.g.: 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.'
The argument values are code, which has its own set of parse rules.
Here's a version of my parser, simplified to only allow basic arithmetic as code:
require 'parslet'
require 'ap'
class TestParser < Parslet::Parser
rule :integer do match('[0-9]').repeat(1).as :integer end
rule :space do match('[\s\\n]').repeat(1) end
rule :parens do str('(') >> code >> str(')') end
rule :operand do integer | parens end
rule :addition do (operand.as(:left) >> space >> str('+') >> space >> operand.as(:right)).as :addition end
rule :code do addition | operand end
rule :name do match('[a-z]').repeat 1 end
rule :argument do name.as(:name) >> str(':') >> space >> code.as(:value) end
rule :arguments do argument >> (str(',') >> space >> argument).repeat end
rule :interpolation do str('#{') >> arguments.as(:arguments) >> str('}') end
rule :text do (interpolation.absent? >> any).repeat(1).as(:text) end
rule :segments do (interpolation | text).repeat end
root :segments
end
string = 'This sentence #{x: 2, y: (2 + 5) + 3} has stuff in it.'
ap TestParser.new.parse(string), index: false
Since the code has its own parse rules (to ensure valid syntax), the argument values are parsed into a subtree (with parentheses etc. replaced by nesting within the subtree):
[
{
:text => "This sentence "@0
},
{
:arguments => [
{
:name => "x"@16,
:value => {
:integer => "2"@19
}
},
{
:name => "y"@22,
:value => {
:addition => {
:left => {
:addition => {
:left => {
:integer => "2"@26
},
:right => {
:integer => "5"@30
}
}
},
:right => {
:integer => "3"@35
}
}
}
}
]
},
{
:text => " has stuff in it."@37
}
]
However, I want to store the argument values as strings, so this would be the ideal result:
[
{
:text => "This sentence "@0
},
{
:arguments => [
{
:name => "x"@16,
:value => "2"
},
{
:name => "y"@22,
:value => "(2 + 5) + 3"
}
]
},
{
:text => " has stuff in it."@37
}
]
How can I use the Parslet subtrees to reconstruct the argument-value substrings? I could write a code generator, but that seems overkill -- Parslet clearly has access to the substring position information at some point (although it might discard it).
Is it possible to leverage or hack Parslet to return the substring?