Empty strings in vector returned from boost spirit x3 parser

Question

I want to check a file for all enums(this is just an MCVE so nothing complicated) and the name of the enums should be stored in an std::vector I build my parsers like this:

auto const any = x3::rule<class any_id, const x3::unused_type>{"any"}
               = ~x3::space;

auto const identifier = x3::rule<class identifier_id, std::string>{"identifier"}
                      = x3::lexeme[x3::char_("A-Za-z_") >> *x3::char_("A-Za-z_0-9")];

auto const enum_finder = x3::rule<class enum_finder_id, std::vector<std::string>>{"enum_finder"}
                       = *(("enum" >> identifier) | any);

When I am trying to parse a string with this enum_finder into a std::vector, the std::vector also contains a lot of empty string. Why is this parser also parsing empty strings into the vector?

This is **not** a MCVE. It's not minimal. It's not complete. It's not verifiable. Do you have sample inputs? — sehe, Jun 11 '16 at 16:19
@sehe [Here](http://melpon.org/wandbox/permlink/51qsx3qf6z8flvi4) is something complete (sadly only using boost 1.60 since I don't have access to my computer and I don't think any of the online compilers has 1.61 available). It seems, not sure if intended or a bug, that the attribute of `a|b` when the attribute of `a` is std::string and `b` is unused is now std::string instead of boost::optional. — llonesmiz, Jun 11 '16 at 16:45
@jv_ I agree that this new behaviour of attribute "collapsing" of alternative parsers when some branches result in `unused_type` seems... less than useful. — sehe, Jun 11 '16 at 16:50

score 2 · Accepted Answer · answered Jun 11 '16 at 16:48

I've assumed you want to parse "enum " out of free form text ignoring whitespaces.

What you really want is for ("enum" >> identifier | any) to synthesize an optional<string>. Sadly, what you get is variant<string, unused_type> or somesuch.

The same happens when you wrap any with x3::omit[any] - it's still the same unused_type.

Plan B: Since you're really just parsing repeated enum-ids separated by "anything", why not use the list operator:

     ("enum" >> identifier) % any

This works a little. Now some tweaking: lets avoid eating "any" character by character. In fact, we can likely just consume whole whitespace delimited words: (note +~space is equivalent +graph):

auto const any = x3::rule<class any_id>{"any"}
               = x3::lexeme [+x3::graph];

Next, to allow for multiple bogus words to be accepted in a row there's the trick to make the list's subject parser optional:

       -("enum" >> identifier) % any;

This parses correctly. See a full demo:

DEMO

Live On Coliru

#include <boost/spirit/home/x3.hpp>
namespace x3 = boost::spirit::x3;

namespace parser {
    using namespace x3;
    auto any         = lexeme [+~space];
    auto identifier  = lexeme [char_("A-Za-z_") >> *char_("A-Za-z_0-9")];
    auto enum_finder = -("enum" >> identifier) % any;
}

#include <iostream>
int main() {

    for (std::string input : {
            "",
            "  ",
            "bogus",
            "enum one",
            "enum one enum two",
            "enum one bogus bogus more bogus enum two !@#!@#Yay",
        })
    {
        auto f = input.begin(), l = input.end();
        std::cout << "------------ parsing '" << input << "'\n";

        std::vector<std::string> data;
        if (phrase_parse(f, l, parser::enum_finder, x3::space, data))
        {
            std::cout << "parsed " << data.size() << " elements:\n";
            for (auto& el : data)
                std::cout << "\t" << el << "\n";
        } else {
            std::cout << "Parse failure\n";
        }

        if (f!=l)
            std::cout << "Remaining unparsed: '" << std::string(f,l) << "'\n";
    }

}

Prints:

------------ parsing ''
parsed 0 elements:
------------ parsing '  '
parsed 0 elements:
------------ parsing 'bogus'
parsed 0 elements:
------------ parsing 'enum one'
parsed 1 elements:
    one
------------ parsing 'enum one enum two'
parsed 1 elements:
    one
------------ parsing 'enum one bogus bogus more bogus enum two !@#!@#Yay'
parsed 2 elements:
    one
    two

but at the example "enum one enum two" why does it only find one element? whould be better if it parses the other one also? — Exagon, Jun 11 '16 at 18:45
@Exagon I think using `auto any = lexeme[*(~space - "enum")];` should work, but I can't test it. — llonesmiz, Jun 11 '16 at 19:37
Yes that should work (not tested). I don't think it expresses intent very nicely though — sehe, Jun 11 '16 at 19:39
Then you didn't try it correctly, I guess. Did you use `*` instead of `+` somewhere? I gave you the SSCCE, please edit it with the failing input... — sehe, Jun 11 '16 at 20:51
Oh blimey. You tried _that_. Yeah, obviously needs to be at least 1 character (`+` instead of `*`) — sehe, Jun 11 '16 at 20:52
If you use `+` then it has the same original problem, it does not accept two `enum whatever` one after the other (which is a lesser problem compared with infinite recursion...). — llonesmiz, Jun 11 '16 at 20:54

Empty strings in vector returned from boost spirit x3 parser

1 Answers1

DEMO