3

Update

This question touches on two issues (as shown by the accepted answer), both of which are present in the version of Boost Spirit X3 that ships with Boost 1.64, but both of which are now fixed (or at least detected at compile time) in develop branch at the time of writing (2017-04-30).

I have updated the mcve project to reflect the changes that I made to use the develop branch instead of the latest boost release, in the hopes that it might help out others who face a similar issues.


The original question

I am trying to learn how to break up Spirit X3 parsers into separate reusable grammars, as encouraged by the example code (rexpr_full and calc in particular) and the presentations from CppCon 2015 and BoostCon.

I have a symbol table (essentially mapping different types to a enum class of the types I am supporting), which I would like to reuse in several parsers. The only example of symbol tables I could find is the roman numerals example, which is in a single source file.

When I try to move the symbol table into its own cpp/h file in the style of the more structured examples my parser will segfault if I try to parse any string which is not in the symbol table. If the symbol table is defined in the same compilation unit as the parsers that use it throws an expectation exception instead (which is what I would expect it to do).

With BOOST_SPIRIT_X3_DEBUG defined I get the following output:

<FruitType>
  <try>GrannySmith: Mammals</try>
  <Identifier>
    <try>GrannySmith: Mammals</try>
    <success>: Mammals</success>
    <attributes>[[
Process finished with exit code 11

I have made a small project which shows what I am trying to achieve and is available here: https://github.com/sigbjornlo/spirit_fruit_mcve

My questions:

  • Why does moving the symbol parser to a separate compilation unit cause a segmentation fault in this case?
  • What is the recommended way of making a symbol table reusable in multiple parsers? (In the MCVE I obviously only use the fruit parser in one other parser, but in my full project I want to use it in several other parsers.)

Below is the code for the MCVE project:

main.cpp

#include <iostream>

#define BOOST_SPIRIT_X3_DEBUG
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/include/adapt_struct.hpp>

#include "common.h"
#include "fruit.h"

namespace ast {
    struct FruitType {
        std::string identifier;
        FRUIT fruit;
    };
}

BOOST_FUSION_ADAPT_STRUCT(ast::FruitType, identifier, fruit);

namespace parser {
    // Identifier
    using identifier_type = x3::rule<class identifier, std::string>;
    const auto identifier = identifier_type {"Identifier"};
    const auto identifier_def = x3::raw[x3::lexeme[(x3::alpha | '_') >> *(x3::alnum | '_')]];
    BOOST_SPIRIT_DEFINE(identifier);

    // FruitType
    struct fruit_type_class;
    const auto fruit_type = x3::rule<fruit_type_class, ast::FruitType> {"FruitType"};

    // Using the sequence operator creates a grammar which fails gracefully given invalid input.
    // const auto fruit_type_def = identifier >> ':' >> make_fruit_grammar();

    // Using the expectation operator causes EXC_BAD_ACCESS exception with invalid input.
    // Instead, I would have expected an expectation failure exception.
    // Indeed, an expectation failure exception is thrown when the fruit grammar is defined here in this compilation unit instead of in fruit.cpp.
    const auto fruit_type_def = identifier > ':' > make_fruit_grammar();

    BOOST_SPIRIT_DEFINE(fruit_type);
}

int main() {
    std::string input = "GrannySmith: Mammals";
    parser::iterator_type it = input.begin(), end = input.end();

    const auto& grammar = parser::fruit_type;
    auto result = ast::FruitType {};

    bool successful_parse = boost::spirit::x3::phrase_parse(it, end, grammar, boost::spirit::x3::ascii::space, result);
    if (successful_parse && it == end) {
        std::cout << "Parsing succeeded!\n";
        std::cout << result.identifier << " is a kind of " << to_string(result.fruit) << "!\n";
    } else {
        std::cout << "Parsing failed!\n";
    }

    return 0;
}

std::string to_string(FRUIT fruit) {
    switch (fruit) {
        case FRUIT::APPLES:
            return "apple";
        case FRUIT::ORANGES:
            return "orange";
    }
}

common.h

#ifndef SPIRIT_FRUIT_COMMON_H
#define SPIRIT_FRUIT_COMMON_H

namespace x3 = boost::spirit::x3;

enum class FRUIT {
    APPLES,
    ORANGES
};

std::string to_string(FRUIT fruit);

namespace parser {
    using iterator_type = std::string::const_iterator;
    using context_type = x3::phrase_parse_context<x3::ascii::space_type>::type;
}

#endif //SPIRIT_FRUIT_COMMON_H

fruit.h

#ifndef SPIRIT_FRUIT_FRUIT_H
#define SPIRIT_FRUIT_FRUIT_H

#include <boost/spirit/home/x3.hpp>

#include "common.h"

namespace parser {
    struct fruit_class;
    using fruit_grammar = x3::rule<fruit_class, FRUIT>;

    BOOST_SPIRIT_DECLARE(fruit_grammar)

    fruit_grammar make_fruit_grammar();
}


#endif //SPIRIT_FRUIT_FRUIT_H

fruit.cpp

#include "fruit.h"

namespace parser {
    struct fruit_symbol_table : x3::symbols<FRUIT> {
        fruit_symbol_table() {
            add
                    ("Apples", FRUIT::APPLES)
                    ("Oranges", FRUIT::ORANGES);
        }
    };

    struct fruit_class;
    const auto fruit = x3::rule<fruit_class, FRUIT> {"Fruit"};
    const auto fruit_def = fruit_symbol_table {};
    BOOST_SPIRIT_DEFINE(fruit);

    BOOST_SPIRIT_INSTANTIATE(fruit_grammar, iterator_type, context_type);

    fruit_grammar make_fruit_grammar() {
        return fruit;
    }
}
Community
  • 1
  • 1
sigbjornlo
  • 1,033
  • 8
  • 20
  • 1
    I'd also suggest moving `-DBOOST_SPIRIT_X3_DEBUG=1 into the makefile, because otherwise you risk ODR violations across translation units – sehe Apr 29 '17 at 13:18
  • I rolled back the title to what it originally was, as the segfault caused by turning on debug output is not what I am asking about (though it definetly was a good catch!), it is the segfault caused by moving the symbol table parser into a separate translation unit. – sigbjornlo Apr 29 '17 at 16:36
  • Oh well, you know I think it's better to have both fixes searchable for the internet. The old question [was indeed answered before](http://stackoverflow.com/questions/41770375/strange-semantic-behaviour-of-boost-spirit-x3-after-splitting/41785268#41785268) – sehe Apr 29 '17 at 18:52
  • 1
    @sehe Yes, I agree, but I wanted to verify that on my end before revamping the question for posterity. Will add a postmortem section to the question. One thing I have started wondering about is whether it is good practice to have rules are namespace scoped variables at all. Is there any reason not to limit the lifetime of these objects as one would with most other object types? – sigbjornlo Apr 30 '17 at 10:05

1 Answers1

4

Very good work on the reproducer. This reminded me of my PR https://github.com/boostorg/spirit/pull/229 (see analysis here Strange semantic behaviour of boost spirit x3 after splitting).

The problem would be Static Initialization Order Fiasco taking copies of the debug-names of rules before they became initialized.

In fact, indeed disabling the debug information does remove the crash, and correctly throws the expectation failure.

The same happens with the develop branch¹, so either there is another similar thing, or I missed a spot. For now, know you can disable the debug output. I'll post an update if I find the spot.

UPDATE:

I didn't miss a spot. There's a separate issue in call_rule_definition where it parameterizes the context_debug<> helper class with the actual attribute type instead of the transformed one:

#if defined(BOOST_SPIRIT_X3_DEBUG)
                typedef typename make_attribute::type dbg_attribute_type;
                context_debug<Iterator, dbg_attribute_type>
                dbg(rule_name, first, last, dbg_attribute_type(attr_), ok_parse);
#endif

The comment seems to suggest that this behaviour is as desired: it tries to print the attribute before transformation. However, it totally doesn't work unless the synthesized attribute type matches the actual attribute type. In this case, it makes context_debug take a dangling reference to a temporary converted attribute, leading to Undefined Behaviour.

It's in fact also undefined behaviour in the working cases. I can only assume things happen to pan out nicely in the inline-definition case, making it seem like things work as intended.

To the best of my knowledge this would be a clean fix, preventing any unwarranted conversions and the temporaries that come with them:

#if defined(BOOST_SPIRIT_X3_DEBUG)
                context_debug<Iterator, transform_attr>
                dbg(rule_name, first, last, attr_, ok_parse);
#endif

I've created pull request for this: https://github.com/boostorg/spirit/pull/232


¹ develop branch doesn't seem merged into the 1.64 release

Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Found the culprit. Created a [new PR](https://github.com/boostorg/spirit/pull/232) with a proposed fix. The maintainers will have to review whether my fix is sound. – sehe Apr 28 '17 at 23:12
  • Hey, and thank you for the awesome response. I'd mark it as accepted, but I suspect it actually solves a different bug than the one that caused me to ask the question. I only turned on debug output (and showed it in the post) in a later edit because I thought it might help someone knowledgable see what I (or Spirit) was doing wrong, and this may have triggered an additional bug. – sigbjornlo Apr 29 '17 at 16:26
  • I've just tried building and running my MCVE varying 3 different factors (Debug or Release build & sources listed as "fruit.cpp main.cpp" or "main.cpp fruit.cpp" & BOOST_SPIRIT_X3_DEBUG defined or not). Debug/Release config has no effect. Listing the sources in a different order will either cause my code to throw an expectation error or segfault, regardless of whether BOOST_SPIRIT_X3_DEBUG is defined or not. It does seem to be a Static Initialization Fiasco, as the order of source files matters, but I do suspect it is a separate issue from the debug output segfault. – sigbjornlo Apr 29 '17 at 16:33
  • Btw, I will be trying to build against the develop branch next, as you mention that when removing the debug output it doesn't crash. Which might be that the issue is already fixed or that our compilers are just handling the static initialization order differently. – sigbjornlo Apr 29 '17 at 17:16
  • That's the issue that the older PR solved, indeed you should use the develop branch to fix that. (IIRC it's possible you get an assertion indicating that the initialization order is not ok, but at least the debug name should not be a stopping problem there.) – sehe Apr 29 '17 at 18:49
  • Oh, I just remember I also made `fruit_grammar const& make_fruit_grammar();` return a reference so initialization order isn't as deadly. – sehe Apr 29 '17 at 18:51