4

I have defined a rule for an identifier: start with an alpha character, followed by any number of alpha-numeric characters. I have differing results when I parse directly into a std::string versus an adapted struct containing a single std::string.

If the attribute for my grammar is std::string, Qi will correctly adapt the sequence of characters into it. But with the struct, only the first character is stored. I'm not quite sure why this is. (Note that it makes no difference if the struct is "truly" adapted, or if it was defined by Fusion inline.)

Here's a SSCCE, configurable to debug:

// Options:
//#define DEFINE_STRUCT_INLINE
//#define DEBUG_RULE

#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/phoenix.hpp>

#include <boost/fusion/adapted/struct/define_struct_inline.hpp>
#include <boost/fusion/include/define_struct_inline.hpp>

#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/fusion/include/adapt_struct.hpp>

#include <iostream>
#include <string>

namespace qi = boost::spirit::qi;

#ifdef DEFINE_STRUCT_INLINE
    namespace example
    {
        BOOST_FUSION_DEFINE_STRUCT_INLINE(
            identifier_result,
            (std::string, name)
            )
    }
#else
    namespace example
    {
        struct identifier_result
        {
            std::string name;
        };
    }

    BOOST_FUSION_ADAPT_STRUCT(
        example::identifier_result,
        (std::string, name)
        )
#endif

namespace example
{
    typedef std::string identifier_result_str;

    template <typename Iterator, typename Result>
    struct identifier_parser : qi::grammar<Iterator, Result()>
    {
        identifier_parser() :
        identifier_parser::base_type(identifier, "identifier_parser")
        {
            identifier %=
                qi::alpha >>
                *qi::alnum
                ;

            identifier.name("identifier");

            #ifdef DEBUG_RULE
                debug(identifier);
            #endif
        }

        qi::rule<Iterator, Result()> identifier;
    };
}

std::string strip(example::identifier_result identifier)
{
    return identifier.name;
}

std::string strip(std::string str)
{
    return str;
}

template <typename Result>
void test_parse(const std::string& input)
{
    using namespace example;

    auto&& first = input.cbegin();
    auto&& last = input.cend();

    auto&& parser = identifier_parser<std::string::const_iterator, Result>();
    auto&& skipper = qi::space;

    Result result;
    qi::phrase_parse(first, last, parser, skipper, result);

    std::cout << "Result of the parse is: \'"
              << strip(result) << "\'" << std::endl;
}

int main()
{
    using namespace example;

    test_parse<identifier_result>(" validId1 ");
    test_parse<identifier_result>(" %error1% ");

    test_parse<identifier_result_str>(" validId2 ");
    test_parse<identifier_result_str>(" %error2% ");
}

The output is:

Result of the parse is: 'v'
Result of the parse is: ''
Result of the parse is: 'validId2'
Result of the parse is: ''

As expected, both error cases don't match. But in the first case, my struct only captures the first character. I'd like to keep the struct for organization purposes.

If I debug the node, I get this output:

<identifier>
  <try>validId1 </try>
  <success> </success>
  <attributes>[[[v]]]</attributes>
</identifier>

[ ... ]

<identifier>
  <try>validId2 </try>
  <success> </success>
  <attributes>[[v, a, l, i, d, I, d, 2]]</attributes>
</identifier>

So I can see the rule is consuming the entire identifier, it just isn't storing it correctly. The only "hint" I have at the difference is that the v in the first case is nested within [[[.]]], while the correct case is only [[.]]. But I don't know what to do with it. :)

Why does this behavior occur?

GManNickG
  • 494,350
  • 52
  • 494
  • 543

1 Answers1

6

Just to get you going, you have to wrap your string in an extra rule.

I don't know the exact explanation, but what you want to do is parsing a string with a sequence of char parsers. With string as attribute type qi is able to use the attribute as container to store several chars, with a struct it just doesn't know how to do this. Maybe it would help to give the struct container properties, but I've no experience here. And for just parsing a string that might be overkill.

Just altering your parser helps here:

namespace example
{
    typedef std::string identifier_result_str;

    template <typename Iterator, typename Result>
    struct identifier_parser : qi::grammar<Iterator, Result()>
    {
        identifier_parser() :
        identifier_parser::base_type(identifier, "identifier_parser")
        {
            string %=
                qi::alpha >>
                *qi::alnum
                ;

            identifier = string;
            identifier.name("identifier");

            #ifdef DEBUG_RULE
                debug(identifier);
            #endif
        }

        qi::rule<Iterator, Result()> identifier;
        qi::rule<Iterator, std::string()> string;
    };
}
Mike M
  • 2,263
  • 3
  • 17
  • 31
  • Yeah, I came up with this work-around as well. Good to have listed here, but I am still curious why this indirection is needed. I don't see a way the struct value can be filled without going through `std::string` first. I'll give this a +1 once a full answer appears, but for now it's only supplementary information. – GManNickG Aug 10 '13 at 23:44
  • As said, you are parsing a sequence of chars and not a string in the spirit sense, and so your attribute has to have container properties. – Mike M Aug 10 '13 at 23:52
  • Sorry, I don't buy that. The attribute should be `tuple>`, which is convertible to `vector`, which is convertible to `string`. How is Qi getting from `tuple>` to my struct without going through `string`? It won't (or shouldn't) just discard the second half of the tuple. Not that you're necessarily wrong (maybe I'm just dense), but I'm looking for formal reasons, not heuristics. – GManNickG Aug 11 '13 at 00:07
  • @GManNickG This comment is not based on knowledge of the spirit code base but on experience and experimentation. So I can't be 100% sure it is right. `tuple>` is convertible to `vector` when the attribute you pass is a "container". The attribute you pass here is basically `tuple>`. Another of the problems with spirit is that it has no problem assigning a tuple to one that is shorter discarding all the elements past the length of the latter. You can see an example of that [here](http://stackoverflow.com/a/16001652/2417774). – llonesmiz Aug 11 '13 at 05:08
  • 1
    @GManNickG I think the workaround in this answer is fine as it is, but you could also use `identifier=qi::as_string[qi::alpha >> *qi::alnum];`. – llonesmiz Aug 11 '13 at 05:18
  • @GManNickG You can see an example of how to "give the struct container properties" [here](http://www.boost.org/libs/spirit/example/qi/custom_string.cpp). [A simplified version of your code running on coliru.](http://coliru.stacked-crooked.com/view?id=36688088ca4f8fad9d2b99bdff9de097-6b9d769ec29bb06adbb586cdcfd7611b) – llonesmiz Aug 11 '13 at 10:43
  • @cv_and_he: Thanks for the backing support to the answer. I'm just really surprised Qi allows (to me) data loss, but if it does it does. And since it does, the premise that it has to get to string before struct is broken. Thanks all! – GManNickG Aug 11 '13 at 14:49
  • @GManNickG It's not by design. I remember that there was a thread in the mailing list some time ago that explained the problem, but I haven't been able to find it. – llonesmiz Aug 11 '13 at 14:54