I am trying to build a basic Latex parser using pest library. For the moment, I only care about lines, bold format and plain text. I am struggling with the latter. To simplify the problem, I assume that it cannot contain these two chars: \
, }
.
lines = { line ~ (NEWLINE ~ line)* }
line = { token* }
token = { text_bold | text_plain }
text_bold = { "\\textbf{" ~ text_plain ~ "}" }
text_plain = ${ inner ~ ("\\" | "}" | NEWLINE) }
inner = @{ char* }
char = {
!("\\" | "}" | NEWLINE) ~ ANY
}
main = {
SOI ~
lines ~
EOI
}
Using this webapp, we can see that my grammar eats the char after the plain text.
Input:
Before \textbf{middle} after.
New line
Output:
- lines > line
- token > text_plain > inner: "Before "
- token > text_plain > inner: "textbf{middle"
- token > text_plain > inner: " after."
- token > text_plain > inner: "New line"
If I replace ${ inner ~ ("\\" | "}" | NEWLINE) }
by ${ inner }
, it fails. If add the &
in front of the suffix, it does not work either.
How can I change my grammar so that lines and bold tags are detected?