1

I have the following code:

#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>

struct printer {
    template <typename int_type>
    void operator()(std::vector<int_type> &vec) {
        std::cout << "vec(" << sizeof(int_type) << "): { ";
        for( auto const &elem : vec ){
            std::cout << elem << ", ";
        }
        std::cout << "}\n";
    }
};

template <typename Iterator>
void parse_int_list(Iterator first, Iterator last) {
    namespace x3 = boost::spirit::x3;
    x3::variant<vector<uint32_t>, vector<uint64_t>> vecs;
    x3::parse( first, last,
            (x3::uint32 % '|') | (x3::uint64 % '|'), vecs );
    boost::apply_visitor(printer{}, vecs);
}

I expected this to first try parsing input into a 32 bit uint vector, then if that failed into a 64 bit uint vector. This works great if the first integer in the list matches a type that is large enough for anything else in the list. I.e.,

string ints32 = "1|2|3";
parse_int_list(being(ints32), end(ints32))
// prints vec(4): { 1, 2, 3, }

string ints64 = "10000000000|20000000000|30000000000";
parse_int_list(being(ints64), end(ints64))
// prints vec(8): { 10000000000, 20000000000, 30000000000, }

However it does not work when the first number is a 32 bit and a later number is a 64 bit.

string ints_mixed = "1|20000000000|30000000000";
parse_int_list(being(ints_mixed), end(ints_mixed))
// prints vec(4): { 1, }

The return value of x3::parse indicates a parse failure. But according to my read of the documentation it should try the second alternative if it can't parse the with the first.

Any pointers on how I'm reading this incorrectly, and how the alternative parser actually works?

Edit: After seeing the responses, I realized that x3::parse was actually returning a parse success. I was checking that it had parsed the entire stream, first == last, to determine success, as demonstrated in the documentation. However, this hides the fact that due to the greedy nature of klean star and not anchoring to the end of stream, it was successfully able to parse a portion of the input. Thanks all.

kalaxy
  • 1,608
  • 1
  • 14
  • 14
  • I'm not sure about X3, but regular spirit list (%) parsers are greedy as in they match as soon as the first element passes. –  May 21 '18 at 22:07

2 Answers2

2

The issue here is that "3" is a valid input for the (x3::uint32 % '|') parser, so the first branch of the alternative passes, consuming only the 3.

The cleanest way for you to fix this would be to have a list of alternatives instead of an alternative of lists.

i.e.:

(x3::uint32 | x3::uint64) % '|'

However, that would mean you would have to parse in a different structure.

vector<x3::variant<uint32_t,uint64_t>> vecs;

Edit:

Alternatively, if you do not intend to use this parser as a sub-parser, you can force a end-of-input in each branch.

(x3::uint32 % '|' >> x3::eoi) | (x3::uint64 % '|' >> x3::eoi)

This would force the first branch to fail if it does not reach the end of the stream, dropping into the alternative.

  • So. Basically, you do not know whether this is a way to fix it at all :) I agree, the AST choice seems like it's a bad design in the first place. Regardless, it's more interesting to note the issue with the parser, than to guess about the solutions, I think. – sehe May 21 '18 at 22:14
  • Hah. You came up with the same semantic fix as I did in the end. +1 – sehe May 21 '18 at 22:22
  • Can you explain more about the difference between a parser and a sub-parser? Is that documented somewhere? (I get the feeling that the X3 documentation leaves out stuff that people might already know if coming from Qi.) – kalaxy May 22 '18 at 16:07
2

As Frank commented, the issue with the Kleene list operator being greedy, accepting as many elements as will match, and considering that a "match".

If you want it to reject input if "some elements have not been parsed", make it so:

parse(first, last, x3::uint32 % '|' >> x3::eoi | x3::uint64 % '|' >> x3::eoi, vecs);

Demo

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <iostream>

struct printer {
    template <typename int_type> void operator()(std::vector<int_type> &vec) const {
        std::cout << "vec(" << sizeof(int_type) << "): { ";
        for (auto const &elem : vec) {
            std::cout << elem << ", ";
        }
        std::cout << "}\n";
    }
};

template <typename Iterator> void parse_int_list(Iterator first, Iterator last) {
    namespace x3 = boost::spirit::x3;
    boost::variant<std::vector<uint32_t>, std::vector<uint64_t> > vecs;

    parse(first, last, x3::uint32 % '|' >> x3::eoi | x3::uint64 % '|' >> x3::eoi, vecs);
    apply_visitor(printer{}, vecs);
}

int main() {
    for (std::string const input : {
             "1|2|3",
             "4294967295",
             "4294967296",
             "4294967295|4294967296",
         }) {
        parse_int_list(input.begin(), input.end());
    }
}

Prints

vec(4): { 1, 2, 3, }
vec(4): { 4294967295, }
vec(8): { 4294967296, }
vec(8): { 4294967295, 4294967296, }
sehe
  • 374,641
  • 47
  • 450
  • 633