0

I am using Boost Spirit X3 to parse some grammars but I encountered some errors which I can not explain. The code below is the simplified version of what I am trying to do.

#ifndef BOOST_SPIRIT_X3_NO_RTTI
#define BOOST_SPIRIT_X3_NO_RTTI
#endif

#include "boost/spirit/home/x3.hpp"
#include <iostream>
#include <string>

namespace spirit3 = boost::spirit::x3;

// Grammar for the language: a^(2n+1)
// S -> aPa
// P -> aPa | a

struct P_id;
constexpr auto P = spirit3::rule<P_id, spirit3::unused_type>{};
constexpr auto P_def = ('a' >> P >> 'a') | 'a';
constexpr auto S = 'a' >> P_def >> 'a';
BOOST_SPIRIT_DEFINE(P);

int
main()
{
    int n_chars_list[] = {5, 7, 9};
    for (auto n_chars : n_chars_list)
    {
        std::string content;
        for (int i = 0; i < n_chars; ++i)
            content += 'a';

        auto iter = content.begin();
        bool is_matched = spirit3::parse(iter, content.end(), S);
        bool is_exhausted = (iter == content.end());

        std::cout << "n_chars: " << n_chars << '\t';
        std::cout << std::boolalpha << "is_matched: " << is_matched << '\t';
        std::cout << std::boolalpha << "is_exhausted: " << is_exhausted << '\n';
    }
}

// Output:
// n_chars: 5   is_matched: true    is_exhausted: false
// n_chars: 7   is_matched: true    is_exhausted: true
// n_chars: 9   is_matched: true    is_exhausted: false

Can anyone explain why the parser failed to recognize the whole string in case n_chars is 5 or 9 but success in case of 7? Thank you.

sehe
  • 374,641
  • 47
  • 450
  • 633
mibu
  • 1,303
  • 11
  • 14

1 Answers1

2

It's the grammar. To help you diagnose the issue I'd suggest debugging it:

Live On Coliru

#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;

// Grammar for the language: a^(2n+1)
// S -> aPa
// P -> aPa | a

constexpr auto P     = x3::rule<struct P_id>{"P"};
constexpr auto P_def = ('a' >> P >> 'a') | 'a';
constexpr auto S     = x3::rule<struct S_id>{"S"} 
                     = 'a' >> P >> 'a';
BOOST_SPIRIT_DEFINE(P)

int main() {
    for (int n_chars : {5, 7, 9}) {
        std::string const content(n_chars, 'a');

        bool matched = x3::parse(content.begin(), content.end(), S >> x3::eoi);

        std::cout << "n_chars: " << n_chars << '\t';
        std::cout << std::boolalpha << "matched: " << matched << "\n";
    }
}

Standard Output:

n_chars: 5      matched: false
n_chars: 7      matched: true
n_chars: 9      matched: false

Standard error debug info:

<S>
  <try>aaaaa</try>
  <P>
    <try>aaaa</try>
    <P>
      <try>aaa</try>
      <P>
        <try>aa</try>
        <P>
          <try>a</try>
          <P>
            <try></try>
            <fail/>
          </P>
          <success></success>
        </P>
        <success>a</success>
      </P>
      <success></success>
    </P>
    <success>aaa</success>
  </P>
  <success>aa</success>
</S>
n_chars: 5  matched: false
<S>
  <try>aaaaaaa</try>
  <P>
    <try>aaaaaa</try>
    <P>
      <try>aaaaa</try>
      <P>
        <try>aaaa</try>
        <P>
          <try>aaa</try>
          <P>
            <try>aa</try>
            <P>
              <try>a</try>
              <P>
                <try></try>
                <fail/>
              </P>
              <success></success>
            </P>
            <success>a</success>
          </P>
          <success></success>
        </P>
        <success>aaa</success>
      </P>
      <success>aa</success>
    </P>
    <success>a</success>
  </P>
  <success></success>
</S>
n_chars: 7  matched: true
<S>
  <try>aaaaaaaaa</try>
  <P>
    <try>aaaaaaaa</try>
    <P>
      <try>aaaaaaa</try>
      <P>
        <try>aaaaaa</try>
        <P>
          <try>aaaaa</try>
          <P>
            <try>aaaa</try>
            <P>
              <try>aaa</try>
              <P>
                <try>aa</try>
                <P>
                  <try>a</try>
                  <P>
                    <try></try>
                    <fail/>
                  </P>
                  <success></success>
                </P>
                <success>a</success>
              </P>
              <success></success>
            </P>
            <success>aaa</success>
          </P>
          <success>aa</success>
        </P>
        <success>a</success>
      </P>
      <success></success>
    </P>
    <success>aaaaaaa</success>
  </P>
  <success>aaaaaa</success>
</S>
n_chars: 9  matched: false

Sidenotes, I simplified some code (unnecessary loops) and also the separate "exhausted" test, instead asserting x3::eoi at the end of the parser expression.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • 2
    Just in case: to parse ANY sequence of `'a'` with a length that is a positive odd integer, I'd write **[`'a' >> *lit("aa") >> eoi` (live demo)](http://coliru.stacked-crooked.com/a/dd23a6b4726ba1cb)** – sehe Aug 28 '21 at 14:16
  • As you suggest, I name all the rule in my grammar and try debug it. It takes me quite sometimes to recognize where I am wrong. Consider S -> a P1 a; P1 -> a P2 a. At this point, P2 successfully produce aaa using the rule P2 -> a P3 a, but this derivation is failed in next step when P1 resume. Other valid parse P2 -> a is never considered because the Boost Spirit use PEG not traditional Chomsky's grammar. If the rule P2 -> a is also considered, the program would have behaved as I expect. – mibu Aug 28 '21 at 15:33
  • Thank you for your answer, super helpful. If I insist on using recursive rule to recognize the language a^(2n+1), what can I do instead? It is not just about this particular problem, I am trying to do something else. Thank you. – mibu Aug 28 '21 at 15:35
  • 1
    The problem is as you state: PEG is fundamentally left-oriented and consumes input eagerly. You could, of course, use all manner of stateful lookahead, but I wouldn't bother coming up with such a contraption for this simple example. So, if you get stuck with any "real" grammar, I'm happy to take a look again. – sehe Aug 28 '21 at 18:29