1

What is the algorithm for developing a string parser to create a geometry? The geometry is generated in 2 steps: at the first step, we create primitives; at the second, we combine primitives into objects.

The syntax is presented in the string below.

string str="[GEOMETRY]    
    PRIMITIVE1=SPHERE(RADIUS=5.5);  
    PRIMITIVE2=BOX(A=-5.2, B=7.3);  
    //...  
    OBJECT1=PRIMITIVE2*(-PRIMITIVE1);  
    //..."

class PRIMITIVE{
    int number;
public:
    Primitive& operator+ (Primitive& primitive) {}; //overloading arithmetic operations
    Primitive& operator* (Primitive& primitive) {};
    Primitive& operator- (Primitive& primitive) {};
    virtual bool check_in_point_inside_primitive = 0;
};

class SPHERE:public PRIMITIVE{
    double m_radius;
public:
    SPHERE(double radius): m_radius(radius) {};  //In which part of the parser to create objects?
    bool check_in_point_inside_sphere(Point& point){};
};

class BOX:public PRIMITIVE{
    double m_A;
    double m_B;
public:
    BOX(double A, double B): m_A(A), m_B(B) {};
    bool check_in_point_inside_box(Point& point){};
};

class OBJECT{
    int number;
    PRIMITIVE& primitive;
public:
    OBJECT(){};
    bool check_in_point_inside_object(Primitive& PRIMITIVE1, Primitive& PRIMITIVE2, Point& point){
        //>How to construct a function from an expression 'PRIMITIVE2*(-PRIMITIVE1)' when parsing?
    }
};
  1. How to analyze the string PRIMITIVE1=SPHERE(RADIUS=5.5) and pass a parameter to the constructor of SPHERE()? How to identify this object with the name PRIMITIVE 1 to call to it in OBJECT? Is it possible to create a pair<PRIMITIVE1,SPHERE(5.5)> and store all primitives in map?

  2. How to parse the string of the OBJECT1 and to construct a function from an expression PRIMITIVE2*(-PRIMITIVE1) inside an OBJECT1? This expression will be required multiple times when determining the position of each point relative to the object.

  3. How to use boost::spirit for this task? Tokenize a string using boost::spirit::lex, and then develop rules using boost::spirit::qi?

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • Your language (grammar) seems to be regular. Then life would be easy and you would need no parser, but just a DFA or ````std::regex````. For the late instatiation of objects you should use an abstract factory. So, can you give more examples and explanations, how your input strings look like? And, what is the meaning of "-" for a primitive? – A M Oct 30 '21 at 17:23
  • Do you really need such a complex grammar? Something like a WaveFront OBJ or, failing that, even JSON would make your life much easier. OBJ parsers are trivial, but not might easily capture the primitive types you're looking for as they focus on vertices & faces rather than CSG. A JSON format, however, would give you all of that, and parsers are not only simple to implement but already freely available. – 3Dave Oct 30 '21 at 17:44
  • We can write a grammar, and imagine a whole word of semantics behind the piece of syntax shown. However I have problems with the type hierarchy. As given they will not compile and not work out (the operators are returning references? If they were to return copies, how would it not slice to the abstract base-class?). I think you need to think your design through a hole lot better before building the tooling to parse it all as well. – sehe Oct 30 '21 at 20:21

1 Answers1

2

As a finger exercise, and despite the serious problems I see with the chosen virtual type hierarchy, let's try to make a value-oriented container of Primitives that can be indexed by their id (ById):

Live On Coliru

#include <boost/intrusive/set.hpp>
#include <boost/poly_collection/base_collection.hpp>
#include <iostream>
namespace bi = boost::intrusive;

struct Point {
};

using IndexHook = bi::set_member_hook<bi::link_mode<bi::auto_unlink>>;

class Primitive {
    int _id;

  public:
    struct ById {
        bool operator()(auto const&... oper) const { return std::less<>{}(access(oper)...); }

      private:
        static int access(int id) { return id; }
        static int access(Primitive const& p) { return p._id; }
    };

    IndexHook _index;

    Primitive(int id) : _id(id) {}
    virtual ~Primitive() = default;
    int id() const { return _id; }

    Primitive& operator+= (Primitive const& primitive) { return *this; } //overloading arithmetic operations
    Primitive& operator*= (Primitive const& primitive) { return *this; }
    Primitive& operator-= (Primitive const& primitive) { return *this; }
    virtual bool check_in_point_inside(Point const&) const = 0;
};

using Index =
    bi::set<Primitive, bi::constant_time_size<false>,
            bi::compare<Primitive::ById>,
            bi::member_hook<Primitive, IndexHook, &Primitive::_index>>;

class Sphere : public Primitive {
    double _radius;

  public:
    Sphere(int id, double radius)
        : Primitive(id)
        , _radius(radius) {} // In which part of the parser to create objects?
    bool check_in_point_inside(Point const& point) const override { return false; }
};

class Box : public Primitive {
    double _A;
    double _B;

  public:
    Box(int id, double A, double B) : Primitive(id), _A(A), _B(B) {}
    bool check_in_point_inside(Point const& point) const override { return false; }
};

class Object{
    int _id;
    Primitive& _primitive;

  public:
    Object(int id, Primitive& p) : _id(id), _primitive(p) {}

    bool check_in_point_inside_object(Primitive const& p1, Primitive const& p2,
                                      Point const& point) const
    {
        //>How to construct a function from an expression
        //'PRIMITIVE2*(-PRIMITIVE1)' when parsing?
        return false;
    }
};

using Primitives = boost::poly_collection::base_collection<Primitive>;

int main() {
    Primitives test;
    test.insert(Sphere{2, 4.0});
    test.insert(Sphere{4, 4.0});
    test.insert(Box{2, 5, 6});
    test.insert(Sphere{1, 4.0});
    test.insert(Box{3, 5, 6});

    Index idx;
    for (auto& p : test)
        if (not idx.insert(p).second)
            std::cout << "Duplicate id " << p.id() << " not indexed\n";

    for (auto& p : idx)
        std::cout << typeid(p).name() << " " << p.id() << "\n";

    std::cout << "---\n";

    for (auto& p : test)
        std::cout << typeid(p).name() << " " << p.id() << "\n";
}

Prints

Duplicate id 2 not indexed
6Sphere 1
3Box 2
3Box 3
6Sphere 4
---
3Box 2
3Box 3
6Sphere 2
6Sphere 4
6Sphere 1

So far so good. This is an important building block to prevent all manner of pain when dealing with virtual types in Spirit grammars¹

PS: I've since dropped the idea of intrusive_set. It doesn't work because the base_container moves items around on reallocation, and that unlinks the items from their intrusive set.

Instead, see below for an approach that doesn't try to resolve ids during the parse.


Parsing primitives

We get the ID from the PRIMITIVE1. We could store it somewhere before naturally parsing the primitives themselves, then set the id on it on commit.

Let's start with defining a State object for the parser:

struct State {
    Ast::Id         next_id;
    Ast::Primitives primitives;
    Ast::Objects    objects;

    template <typename... T> void commit(boost::variant<T...>& val) {
        boost::apply_visitor([this](auto& obj) { commit(obj); }, val);
    }

    template <typename T> void commit(T& primitiveOrExpr) {
        auto id = std::exchange(next_id, 0);
        if constexpr (std::is_base_of_v<Ast::Primitive, T>) {
            primitiveOrExpr.id = id;
            primitives.insert(std::move(primitiveOrExpr));
        } else {
            objects.push_back(Ast::Object{id, std::move(primitiveOrExpr)});
        }
    }
};

As you can see, we just have a place to store the primitives, objects. And then there is the temporary storage for our next_id while we're still parsing the next entity.

The commit function helps sorting the products of the parser rules. As it happens, they can be variant, which is why we have the apply_visitor dispatch for commit on a variant.

Again, as the footnote¹ explains, Spirit's natural attribute synthesis favors static polymorphism.

The semantic actions we need are now:

static inline auto& state(auto& ctx) { return get<State>(ctx); }
auto draft = [](auto& ctx) { state(ctx).next_id = _attr(ctx); };
auto commit = [](auto& ctx) { state(ctx).commit(_attr(ctx)); };

Now let's jump ahead to the primitives:

auto sphere = as<Ast::Sphere>(eps >> "sphere" >>'(' >> param("radius") >> ')');
auto box    = as<Ast::Box>(eps >> "box" >> '(' >> param('a') >> ',' >> param('b') >> ')');
auto primitive =
    ("primitive" >> uint_[draft] >> '=' >> (sphere | box)[commit]) > ';';

That's still cheating a little, as I've used the param helper to reduce typing:

auto number = as<Ast::Number>(double_, "number");
auto param(auto name, auto p) { return eps >> omit[name] >> '=' >> p; }
auto param(auto name) { return param(name, number); }

As you can see I've already assumed most parameters will have numerical nature.

What Are Objects Really?

Looking at it for a while, I concluded that really an Object is defined as an id number (OBJECT1, OBJECT2...) which is tied to an expression. The expression can reference primitives and have some unary and binary operators.

Let's sketch an AST for that:

using Number = double;
struct RefPrimitive { Id id; };
struct Binary;
struct Unary;

using Expr = boost::variant<         //
    Number,                          //
    RefPrimitive,                    //
    boost::recursive_wrapper<Unary>, //
    boost::recursive_wrapper<Binary> //
    >;

struct Unary { char op; Expr oper; };
struct Binary { Expr lhs; char op; Expr rhs; };
struct Object { Id   id; Expr expr; };

Now To Parse Into That Expression AST

It's really 1:1 rules for each Ast node type. E.g.:

auto ref_prim = as<Ast::RefPrimitive>(lexeme["primitive" >> uint_]);

Now many of the expression rules can recurse, so we need declared rules with definitions via BOOST_SPIRIT_DEFINE:

// object expression grammar
rule<struct simple_tag, Ast::Expr>  simple{"simple"};
rule<struct unary_tag,  Ast::Unary> unary{"unary"};
rule<struct expr_tag,   Ast::Expr>  expr{"expr"};
rule<struct term_tag,   Ast::Expr>  term{"term"};
rule<struct factor_tag, Ast::Expr>  factor{"factor"};

As you can tell, some of these are not 1:1 with the Ast nodes, mainly because of the recursion and the difference in operator precedence (term vs factor vs. simple). It's easier to see with the rule definition:

auto unary_def  = char_("-+") >> simple;
auto simple_def = ref_prim | unary | '(' >> expr >> ")";
auto factor_def = simple;
auto term_def   = factor[assign] >> *(char_("*/") >> term)[make_binary];
auto expr_def   = term[assign] >> *(char_("-+") >> expr)[make_binary];

Because none of the rules actually expose a Binary, automatic attribute propagation is not convenient there². Instead, we use assign and make_binary semantic actions:

auto assign = [](auto& ctx) { _val(ctx) = _attr(ctx); };
auto make_binary = [](auto& ctx) {
    using boost::fusion::at_c;
    auto& attr = _attr(ctx);
    auto  op   = at_c<0>(attr);
    auto& rhs  = at_c<1>(attr);
    _val(ctx)  = Ast::Binary { _val(ctx), op, rhs };
};

Finally, let's tie the defintions to the declared rules (using their tag types):

BOOST_SPIRIT_DEFINE(simple, unary, expr, term, factor)

All we need is a similar line to primitive:

auto object =
    ("object" >> uint_[draft] >> '=' >> (expr)[commit]) > ';';

And we can finish up by defining each line as a primitive|object:

auto line = primitive | object;
auto file = no_case[skip(ws_comment)[*eol >> "[geometry]" >> (-line % eol) >> eoi]];

At the top level we expect the [GEOMETRY] header, specify that we want to be case insensitive and ... that ws_comment is to be skipped³:

auto ws_comment = +(blank | lexeme["//" >> *(char_ - eol) >> eol]);

This allows us to ignore the // comments as well.

Live Demo Time

Live On Compiler Explorer

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/fusion/adapted.hpp>
#include <boost/poly_collection/base_collection.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <list>
#include <map>
namespace x3 = boost::spirit::x3;

namespace Ast {
    using Id     = uint32_t;
    struct Point { }; // ?? where does this belong?
    struct Primitive {
        Id id;
        virtual ~Primitive() = default;
    };
    struct Sphere : Primitive { double radius; };
    struct Box : Primitive { double a, b; };

    using Number = double;
    struct RefPrimitive { Id id; };
    struct Binary;
    struct Unary;

    using Expr = boost::variant<         //
        Number,                          //
        RefPrimitive,                    //
        boost::recursive_wrapper<Unary>, //
        boost::recursive_wrapper<Binary> //
        >;

    struct Unary { char op; Expr oper; };
    struct Binary { Expr lhs; char op; Expr rhs; };
    struct Object { Id   id; Expr expr; };
    using Primitives = boost::poly_collection::base_collection<Primitive>;
    using Objects    = std::list<Object>;
    using Index      = std::map<Id, std::reference_wrapper<Primitive const>>;

    std::ostream& operator<<(std::ostream& os, Primitive const& p) {
        return os << boost::core::demangle(typeid(p).name()) << " "
                  << "(id: " << p.id << ")";
    }
    std::ostream& operator<<(std::ostream& os, Object const& o) {
        return os << "object(id:" << o.id << ", expr:" << o.expr << ")";
    }
    std::ostream& operator<<(std::ostream& os, RefPrimitive ref) {
        return os << "reference(prim:" << ref.id << ")";
    }
    std::ostream& operator<<(std::ostream& os, Binary const& b) {
        return os << '(' << b.lhs << b.op << b.rhs << ')';
    }
    std::ostream& operator<<(std::ostream& os, Unary const& u) {
        return os << '(' << u.op << u.oper << ')';
    }
} // namespace Ast

BOOST_FUSION_ADAPT_STRUCT(Ast::Primitive, id)
BOOST_FUSION_ADAPT_STRUCT(Ast::Sphere, radius)
BOOST_FUSION_ADAPT_STRUCT(Ast::Box, a, b)
BOOST_FUSION_ADAPT_STRUCT(Ast::Object, id)
BOOST_FUSION_ADAPT_STRUCT(Ast::RefPrimitive, id)
BOOST_FUSION_ADAPT_STRUCT(Ast::Unary, op, oper)

namespace Parser {
    using namespace x3;

    struct State {
        Ast::Id         next_id;
        Ast::Primitives primitives;
        Ast::Objects    objects;

        template <typename... T> void commit(boost::variant<T...>& val) {
            boost::apply_visitor([this](auto& obj) { commit(obj); }, val);
        }

        template <typename T> void commit(T& val) {
            auto id = std::exchange(next_id, 0);
            if constexpr (std::is_base_of_v<Ast::Primitive, T>) {
                val.id = id;
                primitives.insert(std::move(val));
            } else {
                objects.push_back(Ast::Object{id, std::move(val)});
            }
        }
    };

    static inline auto& state(auto& ctx) { return get<State>(ctx); }
    auto draft = [](auto& ctx) { state(ctx).next_id = _attr(ctx); };
    auto commit = [](auto& ctx) { state(ctx).commit(_attr(ctx)); };

    template <typename T>
    auto as = [](auto p, char const* name = "as") {
        return rule<struct _, T>{name} = p;
    };

    auto ws_comment = +(blank | lexeme["//" >> *(char_ - eol) >> (eol | eoi)]);

    auto number = as<Ast::Number>(double_, "number");
    auto param(auto name, auto p) { return eps >> omit[name] >> '=' >> p; }
    auto param(auto name) { return param(name, number); }

    auto sphere = as<Ast::Sphere>(eps >> "sphere" >>'(' >> param("radius") >> ')');
    auto box    = as<Ast::Box>(eps >> "box" >> '(' >> param('a') >> ',' >> param('b') >> ')');
    auto primitive =
        ("primitive" >> uint_[draft] >> '=' >> (sphere | box)[commit]) > ';';
    
    auto ref_prim = as<Ast::RefPrimitive>(lexeme["primitive" >> uint_], "ref_prim");

    // object expression grammar
    rule<struct simple_tag, Ast::Expr>  simple{"simple"};
    rule<struct unary_tag,  Ast::Unary> unary{"unary"};
    rule<struct expr_tag,   Ast::Expr>  expr{"expr"};
    rule<struct term_tag,   Ast::Expr>  term{"term"};
    rule<struct factor_tag, Ast::Expr>  factor{"factor"};

    auto assign = [](auto& ctx) { _val(ctx) = _attr(ctx); };
    auto make_binary = [](auto& ctx) {
        using boost::fusion::at_c;
        auto& attr = _attr(ctx);
        auto  op   = at_c<0>(attr);
        auto& rhs  = at_c<1>(attr);
        _val(ctx)  = Ast::Binary { _val(ctx), op, rhs };
    };

    auto unary_def  = char_("-+") >> simple;
    auto simple_def = ref_prim | unary | '(' >> expr >> ")";
    auto factor_def = simple;
    auto term_def   = factor[assign] >> *(char_("*/") >> term)[make_binary];
    auto expr_def   = term[assign] >> *(char_("-+") >> expr)[make_binary];

    BOOST_SPIRIT_DEFINE(simple, unary, expr, term, factor)

    auto object =
        ("object" >> uint_[draft] >> '=' >> (expr)[commit]) > ';';
    auto line = primitive | object;
    auto file = no_case[skip(ws_comment)[*eol >> "[geometry]" >> (-line % eol) >> eoi]];
} // namespace Parser

int main() {
    for (std::string const input :
         {
             R"(
[geometry]    
    primitive1=sphere(radius=5.5);  
    primitive2=box(a=-5.2, b=7.3);  
    //...  
    object1=primitive2*(-primitive1);  
    //...)",
             R"(
[GEOMETRY]    
    PRIMITIVE1=SPHERE(RADIUS=5.5);  
    PRIMITIVE2=BOX(A=-5.2, B=7.3);  
    //...  
    OBJECT1=PRIMITIVE2*(-PRIMITIVE1);  
    //...)",
         }) //
    {
        Parser::State state;

        bool ok = parse(begin(input), end(input),
                        x3::with<Parser::State>(state)[Parser::file]);
        std::cout << "Parse success? " << std::boolalpha << ok << "\n";

        Ast::Index index;

        for (auto& p : state.primitives)
            if (auto[it,ok] = index.emplace(p.id, p); not ok) {
                std::cout << "Duplicate id " << p
                          << " (conflicts with existing " << it->second.get()
                          << ")\n";
            }

        std::cout << "Primitives by ID:\n";
        for (auto& [id, prim] : index)
            std::cout << " - " << prim << "\n";

        std::cout << "Objects in definition order:\n";
        for (auto& obj: state.objects)
            std::cout << " - " << obj << "\n";
    }
}

Prints

Parse success? true
Primitives by ID:
 - Ast::Sphere (id: 1)
 - Ast::Box (id: 2)
Objects in definition order:
 - object(id:1, expr:(reference(prim:2)*(-reference(prim:1))))
Parse success? true
Primitives by ID:
 - Ast::Sphere (id: 1)
 - Ast::Box (id: 2)
Objects in definition order:
 - object(id:1, expr:(reference(prim:2)*(-reference(prim:1))))

¹ How can I use polymorphic attributes with boost::spirit::qi parsers?

² and insisting on that leads to classical in-efficiency with rules that cause a lot of backtracking

³ outside of lexemes

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Added radius/A/B printing using a virtual method: https://compiler-explorer.com/z/ccW8jnKTP – sehe Oct 31 '21 at 03:13
  • Thanks for the extensive answer! The RHS of the "sphere" rule synthesizes the attribute `fusion::vector` and LHS has the ``, because in-situ an instance of `Sphere(raduis)` is created. If the data of the Sphere struct has a `matrix mat {3,3,0}` container instead of `double radius`. The parser attribute `param("radius")` must be assigned to the element `mat(3,3)`. Is it possible to do this by creating new semantic rules or is it necessary to create an adapter between the fusion and matrix containers? – 1604C6F229V Nov 24 '21 at 15:53
  • I would always separate concerns, not even in relation to this particular example. Parsing and producing your geometry are fundamentally different tasks. By separating concerns you are avoiding tight coupling and maintenance cost. It will always be easy to replace one part if the need arises. – sehe Nov 24 '21 at 16:03
  • To prove that you *can* do anything you want (obviously): https://compiler-explorer.com/z/47Kj6oK8e I don't think this is the good way to go as this is most likely only the tip of the ice-berg. 110% chance you're going to want to set the other elements of the array as well. Just keep it simple :) – sehe Nov 24 '21 at 16:15
  • Better example using an actual (3,3) matrix and not using libfmt: https://compiler-explorer.com/z/bnod8a9Ms – sehe Nov 24 '21 at 16:21
  • After six months of using this code, I noticed that these rules don't work with left-associative grammar. Since the subtraction operation is left-associative, then when parsing the following example `"object1=primitive1-primitive2-primitive3"` it returns a tree `Binary>` instead of the necessary `Binary,'-',primitive3>`. How to correct these rules for the correct resolution of left-associative grammar? – 1604C6F229V Jul 01 '22 at 17:06
  • 1
    It is necessary to correct the line `auto expr_def = term[assign] >> *(char_("-+") >> expr)[make_binary];` by line `auto expr_def = term[assign] >> *(char_("-+") >> term)[make_binary];` – 1604C6F229V Jul 03 '22 at 17:50