I'm trying to parse a quoted string. Something like this:
say '"in quotes"' ~~ / '"' <-[ " ]> * '"'/;
(From https://docs.raku.org/language/regexes "Enumerated character classes and ranges") But... I want more that one type of quote. Something like this made up syntax that doesn't work:
token attribute_value { <quote> ($<-quote>) $<quote> };
token quote { <["']> };
I found this discussion which is another approach, but it didn't seem to go anywhere: https://github.com/Raku/problem-solving/issues/97. Is there any way of doing this kind of thing? Thanks!
Update 1
I was not able to get @user0721090601's "multi token" solution to work. My first attempt yielded:
$ ./multi-token.raku
No such method 'quoted_string' for invocant of type 'QuotedString'
in block <unit> at ./multi-token.raku line 16
After doing some research I added proto token quoted_string {*}
:
#!/usr/bin/env raku
use Grammar::Tracer;
grammar QuotedString {
proto token quoted_string {*}
multi token quoted_string:sym<'> { <sym> ~ <sym> <-[']> }
multi token quoted_string:sym<"> { <sym> ~ <sym> <-["]> }
token quote { <["']> }
}
my $string = '"foo"';
my $quoted-string = QuotedString.parse($string, :rule<quoted_string>);
say $quoted-string;
$ ./multi-token.raku
quoted_string
* FAIL
(Any)
I'm still learning Raku, so I could be doing something wrong.
Update 2
D'oh! Thanks to @raiph for pointing this out. I forgot to put a quantifier on <-[']>
and <-["]>
. That's what I get for copy/pasting without thinking! Works find when you do it right:
#!/usr/bin/env raku
use Grammar::Tracer;
grammar QuotedString {
proto token quoted_string (|) {*}
multi token quoted_string:sym<'> { <sym> ~ <sym> <-[']>+ }
multi token quoted_string:sym<"> { <sym> ~ <sym> <-["]>+ }
token quote { <["']> }
}
my $string = '"foo"';
my $quoted-string = QuotedString.parse($string, :rule<quoted_string>);
say $quoted-string;
Update 3
Just to put a bow on this...
#!/usr/bin/env raku
grammar NegativeLookahead {
token quoted_string { <quote> $<string>=([<!quote> .]+) $<quote> }
token quote { <["']> }
}
grammar MultiToken {
proto token quoted_string (|) {*}
multi token quoted_string:sym<'> { <sym> ~ <sym> $<string>=(<-[']>+) }
multi token quoted_string:sym<"> { <sym> ~ <sym> $<string>=(<-["]>+) }
}
use Bench;
my $string = "'foo'";
my $bench = Bench.new;
$bench.cmpthese(10000, {
negative-lookahead =>
sub { NegativeLookahead.parse($string, :rule<quoted_string>); },
multi-token =>
sub { MultiToken.parse($string, :rule<quoted_string>); },
});
$ ./bench.raku
Benchmark:
Timing 10000 iterations of multi-token, negative-lookahead...
multi-token: 0.779 wallclock secs (0.759 usr 0.033 sys 0.792 cpu) @ 12838.058/s (n=10000)
negative-lookahead: 0.912 wallclock secs (0.861 usr 0.048 sys 0.909 cpu) @ 10967.522/s (n=10000)
O--------------------O---------O-------------O--------------------O
| | Rate | multi-token | negative-lookahead |
O====================O=========O=============O====================O
| multi-token | 12838/s | -- | -20% |
| negative-lookahead | 10968/s | 25% | -- |
O--------------------O---------O-------------O--------------------O
I'll be going with the "multi token" solution. Thanks everyone!