2

I don't have a whole lot of code to show for this one because I haven't managed to get anything to work, but the high level problem is that I am trying to create a series of parsers for a family of related languages. What I mean by this is that the languages will share many of the same constructs, but there won't be complete overlap. As a simple example, say I have an AST that is parameterized by some (completely contrived in this example) 'leaf' type:

template <typename t>
struct fooT {
  std::string name;
  t leaf;
};

One language may have t instantiated as int and one as double. What I wanted to do was create a templated class or something that I could instantiate with different t's and corresponding parser rules so that I could generate a series of composed parsers.

In my real example, I have a bunch of nested structures that are the same across the languages, but only have a couple of small variations at the very edges of the AST, so if I cannot compose the parsers in a good way, I will end up duplicating a bunch of parse rules, AST nodes, etc. I have actually gotten it to work by not putting it in a class and just very carefully arranging my header files and imports so that I can have 'dangling' parser rules with special names that can be assembled. A big downside of this is that I cannot include parsers for the multiple different languages within the same program -- precisely because of the name conflict that arises.

Does anybody have any ideas how I could approach this?

1 Answers1

1

The nice thing about X3 is that you can generate parsers just as easily as you define them in the first place.

E.g.

template <typename T> struct AstNode {
    std::string name;
    T leaf;
};

Now let's define a generic parser maker:

namespace Generic {
    template <typename T> auto leaf = x3::eps(false);

    template <> auto leaf<int>
        = "0x" >> x3::int_parser<uintmax_t, 16>{};
    template <> auto leaf<std::string>
        = x3::lexeme['"' >> *~x3::char_('"') >> '"'];

    auto no_comment = x3::space;
    auto hash_comments = x3::space |
        x3::lexeme['#' >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
    auto c_style_comments = x3::space |
        "/*" >> x3::lexeme[*(x3::char_ - "*/")] >> "*/";
    auto cxx_style_comments = c_style_comments |
        x3::lexeme["//" >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);

    auto name = leaf<std::string>;

    template <typename T> auto parseNode(auto heading, auto skipper) {
        return x3::skip(skipper)[
            x3::as_parser(heading) >> name >> ":" >> leaf<T>
        ];
    }
}

This allows us to compose various grammars with various leaf types and skipper styles:

namespace Language1 {
    static auto const grammar =
        Generic::parseNode<int>("value", Generic::no_comment);
}

namespace Language2 {
    static auto const grammar =
        Generic::parseNode<std::string>("line", Generic::cxx_style_comments);
}

Let's Demo:

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted.hpp>
#include <iomanip>
namespace x3 = boost::spirit::x3;

template <typename T> struct AstNode {
    std::string name;
    T leaf;
};

BOOST_FUSION_ADAPT_TPL_STRUCT((T), (AstNode)(T), name, leaf)

namespace Generic {
    template <typename T> auto leaf = x3::eps(false);

    template <> auto leaf<int>
        = "0x" >> x3::uint_parser<uintmax_t, 16>{};
    template <> auto leaf<std::string>
        = x3::lexeme['"' >> *~x3::char_('"') >> '"'];

    auto no_comment = x3::space;
    auto hash_comments = x3::space |
        x3::lexeme['#' >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);
    auto c_style_comments = x3::space |
        "/*" >> x3::lexeme[*(x3::char_ - "*/")] >> "*/";
    auto cxx_style_comments = c_style_comments |
        x3::lexeme["//" >> *(x3::char_ - x3::eol)] >> (x3::eol | x3::eoi);

    auto name = leaf<std::string>;

    template <typename T> auto parseNode(auto heading, auto skipper) {
        return x3::skip(skipper)[
            x3::as_parser(heading) >> name >> ":" >> leaf<T>
        ];
    }
}

namespace Language1 {
    static auto const grammar =
        Generic::parseNode<int>("value", Generic::no_comment);
}

namespace Language2 {
    static auto const grammar =
        Generic::parseNode<std::string>("line", Generic::cxx_style_comments);
}

void test(auto const& grammar, std::string_view text, auto ast) {
    auto f = text.begin(), l = text.end();
    std::cout << "\nParsing: " << std::quoted(text, '\'') << "\n";
    if (parse(f, l, grammar, ast)) {
        std::cout << " -> {name:" << ast.name << ",value:" << ast.leaf << "}\n";
    } else {
        std::cout << " -- Failed " << std::quoted(text, '\'') << "\n";
    }
}

int main() {
    test(Language1::grammar, R"(value "one": 0x01)", AstNode<int>{});
    test(
        Language2::grammar,
        R"(line "Hamlet": "There is nothing either good or bad, but thinking makes it so.")",
        AstNode<std::string>{});

    test(
        Language2::grammar,
        R"(line // rejected: "Hamlet": "To be ..."
        "King Lear": /*hopefully less trite:*/"As flies to wanton boys are we to the gods")",
        AstNode<std::string>{});
}

Prints

Parsing: 'value "one": 0x01'
 -> {name:one,value:1}

Parsing: 'line "Hamlet": "There is nothing either good or bad, but thinking makes it so."'
 -> {name:Hamlet,value:There is nothing either good or bad, but thinking makes it so.}

Parsing: 'line // rejected: "Hamlet": "To be ..."
        "King Lear": /*hopefully less trite:*/"As flies to wanton boys are we to the gods"'
 -> {name:King Lear,value:As flies to wanton boys are we to the gods}

Advanced

For advanced scenarios (where you have separation of rule declaration and definitions across trnalsation units and/or you require dynamic switching), you can use the x3::any_rule<> holder.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • 1
    I do have it split across translation units, so I will look into any_rule. This is exactly what I needed. I was trying to embed in a class and everything was terrible. Thank you! – Timothy Braje May 14 '21 at 18:44
  • thank you again for this answer! I did manage to get my parser refactored to use this style. Unfortunately, compile times have gone through the roof! It now takes about 5 minutes and 10GB of memory to compile the parser. Does anyone here know if there are good ways to fix this? My particular language has a big recursive variant several of whose components use the above templating feature. – Timothy Braje May 20 '21 at 20:05
  • Yes. The usual fix here is reintroducing BOOST_SPIRIT_DEFINE use. It appears that without, the context type is getting encumbered with the entire parser expression. And that can lead to an explosion of instantiations for different context types, especially with recursion and/or skipper changes. – sehe May 20 '21 at 20:23
  • If you show the code somehow, maybe I can try it as an ~exorcism~ exercise. I have to learn to thread that needle one day. – sehe May 20 '21 at 20:23
  • I can try and get you code. I am not sure I am technically allowed to post it, and it is also split across a large amount of files. I did see some of your comments in other threads and added back the BOOST_SPIRIT_DEFINE calls where I could, but it didn't make much difference. The problem is that with all of the templated parsers, I don't know how to define them with BOOST_SPIRIT_DEFINE. I have to admit I am not a very advanced C++ developer. – Timothy Braje May 20 '21 at 21:21
  • I'm not quite sure myself, which is basically why I'm interested in figuring out.. It's gonna involve that any_rule I think. (BTW whoever uses X3 is advanced. By any reasonable standard. And I see more than one sign you're not just cargo-culting it, so you're fine :)) – sehe May 20 '21 at 21:24
  • Actually, it may take me a couple of days with my schedule but I bet I can create a minimal example I could post... – Timothy Braje May 20 '21 at 21:25
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/232670/discussion-between-sehe-and-timothy-braje). – sehe May 20 '21 at 21:26