4

I'm still pretty new to Boost and C++ (only a week in learning this). With the great help from Sehe from my other post Boost Spirit question here I'm trying to extend the Simple class to parse and accept more syntax but what I have doesn't seem to work right.

Currently, Simple class can handle this input:

Class Simple caption;
Class Simple columns "Column Name";

This is the original Simple class from the above link

using Id = std::string;
using Literal = std::string;
using Enum = std::vector<Id>;

struct Base {
    Id      id;
    Literal literal;
};

struct Simple : Base {
    Enum enumeration;
};
BOOST_FUSION_ADAPT_STRUCT(Ast::Simple, id, literal, enumeration);

And the grammar rule

id_ = raw[alpha >> *('_' | alnum)];
id_ = raw[alpha >> *('_' | alnum)];
literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';
auto optlit = copy(literal_ | attr(std::string(" ")));

simple_ = lit("Simple") >> id_ >> optlit >> enum_;
enum_ = -(lit("enumeration") >> '(' >> (id_ % ',') > ')');

And the expected outcome is:

<class>
    <simple>
      <identifier>caption:caption</identifier>
      <literal>" "</literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>columns:columns</identifier>
      <literal>"Column Name"</literal>
    </simple>
  </class>

I'm trying to extend this rule so the Simple class can accept the following:

Class Simple caption;
Class Simple columns "Column Name";
Class Simple some_value datatype restriction;
Class Simple other_value Default 0;

And the new expected outcome should be:

<class>
    <simple>
      <identifier>caption:caption</identifier>
      <literal>" "</literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>columns:columns</identifier>
      <literal>"Column Name"</literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>some_value:some_value</identifier>
      <literal>" "</literal>
      <word>restriction</word>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>other_value:other_value</identifier>
      <literal>" "</literal>
      <default>
            <num>0</num>
      </default>
    </simple>
  </class>

This is what I did but Simple class did not correctly parse the above.

using Id = std::string;
using Literal = std::string;
using Datatype = std::string;
using Default = std::string;
using Enum = std::vector<Id>;

struct Base {
    Id      id;
    Literal literal;
};

struct Simple : Base {
    Enum enumeration;
    Datatype datatype;
        Default def;
};
BOOST_FUSION_ADAPT_STRUCT(Ast::Simple, id, literal, enumeration, datatype, def);

And for the grammar rule, I did this:

id_ = raw[alpha >> *('_' | alnum)];
id_ = raw[alpha >> *('_' | alnum)];
literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';
auto optlit = copy(literal_ | attr(std::string(" ")));

simple_ = lit("Simple") >> id_ >> optlit >> (enum_ | datatype_ | default_);
enum_ = -(lit("enumeration") >> '(' >> (id_ % ',') > ')');
datatype_ = -(lit("datatype") >> id_);
default_ = -(lit("Default") >> alnum)

qi::rule<It, Ast::Simple(), Skipper>    simple_;
qi::rule<It, Ast::Enum(), Skipper>      enum_;
qi::rule<It, Ast::Datatype(), Skipper>  datatype_;
qi::rule<It, Ast::Default(), Skipper>  default_;

I keep getting parse error

 -> EXPECTED ";" in line:3
       Class Simple my_value datatype restriction;
                             ^--- here

It doesn't seem to accept and parse the datatype. What did I miss or do wrong here?
Thanks for your help

Full Demo here: Live on Compiler Explorer

Dylan
  • 121
  • 7
  • Could be related to: https://stackoverflow.com/q/73322232/4641116 . Sehe is awesome. – Eljay Jun 06 '23 at 16:49
  • From the `BOOST_SPIRIT_DEBUG` output, I see that the `enum_` matched, because it is optional, so the `(enum_ | datatype_ | default_)` is done, but then there is still "datatype restriction;" to parse, and the `> ';'` fails because there's nothing to parse the "datatype restriction" before the ";". – Eljay Jun 06 '23 at 17:17
  • @Eljay. Thanks for looking into this. ```datatype_ and default_``` are also optional. So Simple class can have additional enum or datatype or default. That's what I'm trying to extend. Currently only enum_ work but datatype_ and default_ don't work. – Dylan Jun 06 '23 at 17:27
  • 1
    No time to look, but from the comments it seems like you want `(-a >> -b >> -c)` instead of `-(a|b|c)` – sehe Jun 06 '23 at 17:42
  • @Sehe. Echoing what Eljay said: "Sehe you are awesome". I was under impression that my individual rule already said parse zero or one so I'm using the | rule. Thanks again both of you Eljay and Sehe. – Dylan Jun 06 '23 at 17:48
  • 1
    So I'm back :) Just went through the thing and reviewed. Long story short: don't settle for less. Zooming out, I think, instead of just diving in, you should spend the time to get to know the AST/Domain and describe it fully. This prevents you from redoing a lot of work multiple times and/or running into impossibilities. (Cheers @Eljay thanks for the kind words :)) – sehe Jun 06 '23 at 21:45

1 Answers1

1

One problem I already addressed in the comment:

it seems like you want (-a >> -b >> -c) instead of -(a|b|c)

Indeed, running your program prints

 -> EXPECTED ";" in line:3
       Class Simple my_value datatype restriction;
                             ^--- here

On the one hand, I'm thrilled that now my VeryPretty™ error reporting got a chance to shine, but clearly you wanted to change

simple_    = lit("Simple") >> id_ >> optlit >> (enum_ | datatype_ | default_);

To

simple_    = lit("Simple") >> id_ >> optlit >> enum_ >> datatype_ >> default_;

Indeed, changing that around prints:

<task>
  <class>
    <simple>
      <identifier>caption</identifier>
      <literal> </literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>columns</identifier>
      <literal>Column Name</literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>my_value</identifier>
      <literal> </literal>
      <word>restriction</word>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>def_value</identifier>
      <literal> </literal>
      <default>
        <num>0</num>
      </default>
    </simple>
  </class>
</task>

However several things seem out of order. For one the appearance of "word" in the XML seems odd, since Simple does not contain such a member. It does seem like the implementation details are leaking out in the XML.

I can guess that this is caused by the lack of control that is, itself, caused by ambiguous types:

using Id       = std::string;
using Literal  = std::string;
using Datatype = std::string;
using Default  = std::string;

Choosing identical types for domain datatypes is fine, but it breaks down when you try to use the C++ type system to enforce domain logic. Specifically, we use overload resolution to pick what XML to generate for an AST node.

You may note that I had already realized this conundrum in my original sketch, which is why you see the overload generating for std::string take a disambiguator kind:

void apply(Node parent, std::string const& s, char const* kind) const {

And there it is that we run into a different code smell:

if (s != "") {
    named_child(parent, kind).text().set(s.c_str());
}

I know I didn't have the conditional there. And the conditional suffers the same problem as I just described: it is blind to the domain semantics of the "thing" being serialized. Some strings might be "present" and need to be serialized even if empty (e.g. consider when a default value is the empty string, dropping the node entirely may change the meaning).

Instead, you want to express intent/semantics in the AST itself.

Expressive AST

Let's create distinct string types.

I showed one approach in the first Live Demo before. I "cleverly" omitted that hack from the answer because it wasn't essential to the final solution, and also because that solution using inheritance violates Liskov Substitution Principle as well as risks surprises (e.g.).

So let's do another approach. Still using a hack to avoid writing lots of code, but without the issues described:

template <typename> struct flavour_hack : std::char_traits<char> {};
template <typename Tag>
    using String = std::basic_string<char, flavour_hack<Tag> >;

using Id       = String<struct TagId>;
using Literal  = String<struct TagLiteral>;
using Datatype = String<struct TagDatatype>;
using Default  = String<struct TagDefault>;

Yes these are fully standard C++ strings, but distinct from eachother. Nice. Now the above overload can be split into different overloads, dropping the kind parameter since that information is now part of the AST type:

void apply(Node parent, Ast::Id const& id) const {
    named_child(parent, "identifier").text().set(id.c_str());
}

void apply(Node parent, Ast::Literal const& literal) const {
    named_child(parent, "literal").text().set(literal.c_str());
}

void apply(Node parent, Ast::Datatype const& datatype) const {
    named_child(parent, "datatype").text().set(datatype.c_str());
}

void apply(Node parent, Ast::Default const& default_) const {
    named_child(parent, "default").text().set(default_.c_str());
}

What about the optional strings? Well I'm glad you asked. And I'm glad hypothetical you phrased the question accurately: optional strings are just strings, that are optional. It's easy to express the semantics in C++:

struct Simple : Base {
    Enum                    enumeration;
    std::optional<Datatype> datatype;
    std::optional<Default>  def;
};

What's more, now I can probably stop being lazy about the enumeration as well. Before, I just decided/assumed on your behalf that empty enumerations couldn't occur. Let's not, just to drive home the point about domain semantics:

struct Simple : Base {
    std::optional<Enum>     enumeration;
    std::optional<Datatype> datatype;
    std::optional<Default>  def;
};

And fixing the rule for enumeration:

enum_ = lit("enumeration") >> '(' >> -(id_ % ',') > ')';

Moving the optionality to the parent rule smooths out the mapping of rules to their bound attributes:

simple_    = lit("Simple") >> id_ >> optlit >> -enum_ >> -datatype_ >> -default_;
enum_      = lit("enumeration") >> '(' >> -(id_ % ',') > ')';
datatype_  = lit("datatype") >> id_;
default_   = lit("Default") >> alnum;

With these tweaks, we get Live Demo for

Class Simple A;
Class Simple B "Column Name";
Class Simple C datatype restriction;
Class Simple D Default 0;
Class Simple E enumeration() Default 0;
Class Simple F enumeration() Default "not_a_number";

The output:

<task>
  <class>
    <simple>
      <identifier>A</identifier>
      <literal> </literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>B</identifier>
      <literal>Column Name</literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>C</identifier>
      <literal> </literal>
      <datatype>restriction</datatype>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>D</identifier>
      <literal> </literal>
      <default>0</default>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>E</identifier>
      <literal> </literal>
      <default>0</default>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>F</identifier>
      <literal> </literal>
      <default>"not_a_number"</default>
    </simple>
  </class>
</task>

Remaining Loose Ends

I've ignored this bit:

// TODO FIXME
//void apply(Node parent, Default const& d) const {
    //if (d != "") {
        //auto def_ = named_child(parent, "default");
        //named_child(def_, "num").text().set(d.c_str());
    //}
//}

I did so, because it opens an entirely new can of worms. The "arbitrary" appearance of a num XML element tells me that your language has a type system with ditto literals. And that the parser expression

default_ = -(lit("Default") >> alnum)

was likely woefully underspecified. Do yourself a favor and don't settle for sloppy a solution. Instead, just model what you know to be true and enjoy a life without unnecessary complexity and hard-to-maintain code. E.g.:

using Number = double;
using Value  = boost::variant<Literal, Number, Id>;

struct Simple : Base {
    std::optional<Enum>     enumeration;
    std::optional<Datatype> datatype;
    std::optional<Value>    def;
};

With ditto

value_     = literal_ | number_ | id_;
number_    = double_;
default_   = lit("Default") >> value_;

And some tweaks to XML generator:

Live On Compiler Explorer

// #define BOOST_SPIRIT_DEBUG 1
#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <iomanip>
#include <optional>
namespace qi = boost::spirit::qi;

namespace Ast {
    using boost::recursive_wrapper;

    template <typename> struct flavour_hack : std::char_traits<char> {};
    template <typename Tag>
        using String = std::basic_string<char, flavour_hack<Tag> >;

    using Id       = String<struct TagId>;
    using Literal  = String<struct TagLiteral>;
    using Datatype = String<struct TagDatatype>;
    struct Base {
        Id      id;
        Literal literal;
    };

    using Enum = std::vector<Id>;

    using Number = double;
    using Value  = boost::variant<Literal, Number, Id>;

    struct Simple : Base {
        std::optional<Enum>     enumeration;
        std::optional<Datatype> datatype;
        std::optional<Value>    default_;
    };

    struct Complex;
    struct Container;

    using Class = boost::variant<   
        Simple,                     
        recursive_wrapper<Complex>, 
        recursive_wrapper<Container>
    >;

    using Classes = std::vector<Class>;
    struct Container : Base { Class   element; };
    struct Complex   : Base { Classes members; };

    using Task = std::vector<Class>;
} // namespace Ast

BOOST_FUSION_ADAPT_STRUCT(Ast::Simple,    id, literal, enumeration, datatype, default_);
BOOST_FUSION_ADAPT_STRUCT(Ast::Complex,   id, literal, members)
BOOST_FUSION_ADAPT_STRUCT(Ast::Container, id, literal, element)

namespace Parser {
    template <typename It> struct Task : qi::grammar<It, Ast::Task()> {
        Task() : Task::base_type(start) {
            using namespace qi;

            start = skip(space)[task_];

            // lexemes:
            id_      = raw[alpha >> *('_' | alnum)];
            literal_ = '"' > *('\\' >> char_ | ~char_('"')) > '"';

            auto optlit = copy(literal_ | attr(std::string(" "))); // weird, but okay

            task_      = *class_ > eoi;
            type_      = simple_ | complex_ | container_;
            class_     = lit("Class") > type_ > ';';
            simple_    = lit("Simple") >> id_ >> optlit >> -enum_ >> -datatype_ >> -default_;
            complex_   = lit("Complex") >> id_ >> optlit >> '(' >> *type_ >> ')';
            container_ = lit("Container") >> id_ >> optlit >> '(' >> type_ > ')';
            enum_      = lit("enumeration") >> '(' >> -(id_ % ',') > ')';
            datatype_  = lit("datatype") >> id_;
            value_     = literal_ | number_ | id_;
            number_    = double_;
            default_   = lit("Default") >> value_;

            BOOST_SPIRIT_DEBUG_NODES(
                (task_)(class_)(type_)(simple_)(complex_)(container_)(enum_)(datatype_)
                (default_)(id_)(literal_)(value_)(number_)
            )
        }

    private:
        qi::rule<It, Ast::Task()> start;

        using Skipper = qi::space_type;
        qi::rule<It, Ast::Task(), Skipper>      task_;
        qi::rule<It, Ast::Class(), Skipper>     class_, type_;
        qi::rule<It, Ast::Simple(), Skipper>    simple_;
        qi::rule<It, Ast::Complex(), Skipper>   complex_;
        qi::rule<It, Ast::Container(), Skipper> container_;
        qi::rule<It, Ast::Enum(), Skipper>      enum_;
        qi::rule<It, Ast::Datatype(), Skipper>  datatype_;
        qi::rule<It, Ast::Value(), Skipper>     default_;

        // lexemes:
        qi::rule<It, Ast::Id()>      id_;
        qi::rule<It, Ast::Literal()> literal_;
        qi::rule<It, Ast::Value()>   value_;
        qi::rule<It, Ast::Number()>  number_;
    };
}

#include <pugixml.hpp>
namespace Generate {
    using namespace Ast;

    struct XML {
        using Node = pugi::xml_node;

        // callable for variant visiting:
        template <typename T> void operator()(Node parent, T const& node) const { apply(parent, node); }

    private:
        template <typename... Ts>
        void apply(Node parent, boost::variant<Ts...> const& v) const {
            using std::placeholders::_1;
            boost::apply_visitor(std::bind(*this, parent, _1), v);
        }

        void apply(Node parent, Ast::Number const& num) const {
            named_child(parent, "num").text().set(num);
        }

        void apply(Node parent, Ast::Id const& id) const {
            named_child(parent, "identifier").text().set(id.c_str());
        }

        void apply(Node parent, Ast::Literal const& literal) const {
            named_child(parent, "literal").text().set(literal.c_str());
        }

        void apply(Node parent, Ast::Datatype const& datatype) const {
            named_child(parent, "datatype").text().set(datatype.c_str());
        }

        template <typename T> void apply(Node parent, std::optional<T> const& opt) const {
            if (opt)
                apply(parent, *opt);
        }

        void apply(Node parent, Simple const& s) const {
            auto simple = named_child(parent, "simple");
            apply(simple, s.id);
            apply(simple, s.literal);
            apply(simple, s.enumeration);
            apply(simple, s.datatype);
            if (s.default_.has_value()) {
                apply(named_child(simple, "default"), *s.default_);
            }
        }

        void apply(Node parent, Enum const& e) const {
            auto enum_ = named_child(parent, "enumeration");
            for (auto& v : e)
                named_child(enum_, "word").text().set(v.c_str());
        }

        void apply(Node parent, Complex const& c) const {
            auto complex_ = named_child(parent, "complex");
            apply(complex_, c.id);
            apply(complex_, c.literal);
            for (auto& m : c.members)
                apply(complex_, m);
        }

        void apply(Node parent, Container const& c) const {
            auto cont = named_child(parent, "container");
            apply(cont, c.id);
            apply(cont, c.literal);
            apply(cont, c.element);
        }

        void apply(Node parent, Task const& t) const {
            auto task = named_child(parent, "task");
            for (auto& c : t)
                apply(task.append_child("class"), c);
        }

    private:
        Node named_child(Node parent, std::string const& name) const {
            auto child = parent.append_child();
            child.set_name(name.c_str());
            return child;
        }
    };
} // namespace Generate

int main() { 
    using It = std::string_view::const_iterator;
    static const Parser::Task<It> p;
    static const Generate::XML to_xml;

    for (std::string_view input :
    {
        R"(Class Simple A;
    Class Simple B "Column Name";
    Class Simple C datatype restriction;
    Class Simple D enumeration() Default NaN;
    Class Simple E Default -Inf;
    Class Simple F Default 42;
    Class Simple G Default 7.8e-10;
    Class Simple H Default "not a number";
    Class Simple I Default SomeIdentifier_3;
        )"
    }) //
    {
        try {
            Ast::Task t;

            if (qi::parse(begin(input), end(input), p, t)) {
                pugi::xml_document doc;
                to_xml(doc.root(), t);
                doc.print(std::cout, "  ", pugi::format_default);
                std::cout << std::endl;
            } else {
                std::cout << " -> INVALID" << std::endl;
            }
        } catch (qi::expectation_failure<It> const& ef) {
            auto f    = begin(input);
            auto p    = ef.first - input.begin();
            auto bol  = input.find_last_of("\r\n", p) + 1;
            auto line = std::count(f, f + bol, '\n') + 1;
            auto eol  = input.find_first_of("\r\n", p);

            std::cerr << " -> EXPECTED " << ef.what_ << " in line:" << line << "\n"
                << input.substr(bol, eol - bol) << "\n"
                << std::setw(p - bol) << ""
                << "^--- here" << std::endl;
        }
    }
}

Printing

<task>
  <class>
    <simple>
      <identifier>A</identifier>
      <literal> </literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>B</identifier>
      <literal>Column Name</literal>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>C</identifier>
      <literal> </literal>
      <datatype>restriction</datatype>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>D</identifier>
      <literal> </literal>
      <enumeration />
      <default>
        <num>nan</num>
      </default>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>E</identifier>
      <literal> </literal>
      <default>
        <num>-inf</num>
      </default>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>F</identifier>
      <literal> </literal>
      <default>
        <num>42</num>
      </default>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>G</identifier>
      <literal> </literal>
      <default>
        <num>7.7999999999999999e-10</num>
      </default>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>H</identifier>
      <literal> </literal>
      <default>
        <literal>not a number</literal>
      </default>
    </simple>
  </class>
  <class>
    <simple>
      <identifier>I</identifier>
      <literal> </literal>
      <default>
        <identifier>SomeIdentifier_3</identifier>
      </default>
    </simple>
  </class>
</task>
sehe
  • 374,641
  • 47
  • 450
  • 633
  • I'm enjoy reading your explanations. It helps me a lot in exploring and learning C++ and boost. With my limited knowledge on this subject, I can only explore or do a certain task. You seem to know these things inside out so it every easy for you to come up with alternative routes when problem occurs. I do appreciate your willingness of taking time share your wisdom, and teach us newbie in this forum. I'm very sure I'll have more question once I get the full document of the code definition. Hopefully with these lessons, I can move along and getting it done for my team. :) – Dylan Jun 06 '23 at 22:11
  • (I just noticed I forgot to remove the `.empty()` check on generating XML for enumerations) – sehe Jun 06 '23 at 22:13
  • 1
    _"You seem to know these things inside out so it every easy for you to come up with alternative routes when problem occurs"_ - The very definition of experience. It's why I feel I can permit myself to put guidance in somewhat opinionated terms ("do yourself a favour"). I know because I've learnt that lesson the hard way, many times over. I try to back it up with reasoning and links to the principles involved, so the seed can fall on fertile grounds. That's how we learn. Cheers! – sehe Jun 06 '23 at 22:16
  • What if `enum_ >> datatype_ >> default_` are three mutually exclusive arms (and all are optional)? What if each can appear zero-or-once, and they can appear in any order? – Eljay Jun 07 '23 at 13:09
  • 1
    @Eljay it's all [documented](https://www.boost.org/doc/libs/1_82_0/libs/spirit/doc/html/spirit/qi/reference/operator.html) :) So to get zero-once in any order you can say `-(a ^ b ^ c)`. Three mutually exclusives is almost what OP accidentally wrote (`-(a|b|c)` instead of `(-a|-b|-c)`) – sehe Jun 07 '23 at 13:12
  • @sehe. Since I'm using the latest boost version (1.82,0), do you recommend using QI or X3 for this project? They both new for me to learn anyway. I just don't want to spend time upgrading this to X3 in the future. – Dylan Jun 08 '23 at 15:18
  • I think Qi is more mature and has fewer surprises (especially surrounding automatic propagation). However, X3 is a lot quicker to compile. If you are prepared to do a lot manual massaging with semantic actions anyways, X3 might be easier, since they're bit more natural now. – sehe Jun 08 '23 at 16:22