2

I'm trying to parse into something of the form

enum class shape { ellipse, circle };
enum class other_shape { square, rectangle };
enum class position { top, left, right, bottom, center, bottom };
struct result
{
    std::variant<shape, std::string> bla;
    position pos;
    std::vector<double> bloe;
};

I know this doesn't make much sense (why not merge shape and other_shape, right?), but I tried to simplify the result types into something that resembles the buildup of the real result. But the form of the input is somewhat flexible, such that I seem to need extra alternatives that do not properly map onto the above struct definition, resulting in "unexpected attribute size" static assertions.

The real problem is the comma between the bla+pos and bloe parts in the input, due to both being possibly omitted. Example inputs

circle at center, 1, 2, 3
at top, 1, 2, 3
at bottom
circle, 1, 2 3
1, 2, 3
my_fancy_shape at right, 1

Each time some part is omitted, it gets a default value (let's say the first value of the enum and type in the variant.

My grammar looks somewhat like this

( circle
| ellipse
| square
| rectangle
| x3::attr(shape::circle)
) >> ( "at" >> position
     | x3::attr(css::center)
     ) >> -x3::lit(',')
  >> x3::double_ % ','

As you can see, the first alternative set maps directly to the variant (and includes a default value if it's completely omitted), the second alternative set provides a default value if the at portion is missing. Next is the vector of comma-separated values.

The issue I have here is that the above grammar will match both these invalid inputs:

, 1, 2, 3
circle 1, 2, 3

So the result, although somewhat elegant, is sloppy.

How can I, without altering the form of the result, write a grammar that has the required comma only if the first part is not empty?

I can think of grammars that do this by joining the two alternative sets into one set of all mixed possibilities, with the comma where it actually should appear, but then Spirit.X3 cannot map this alternative parser onto two members (a variant and a value). E.g. a very inefficient baseline "all the posibilities listed":

( circle >> x3::attr(position::center) >> ','
| ellipse >> x3::attr(position::center) >> ','
| square >> x3::attr(position::center) >> ','
| rectangle >> x3::attr(position::center) >> ','
| circle >> "at" >> position >> ','
| ellipse >> "at" >> position >> ','
| square >> "at" >> position >> ','
| rectangle >> "at" >> position >> ','
| x3::attr(shape::circle) >> "at" >> position >> ','
| x3::attr(shape::circle) >> x3::attr(position::center)
) >> x3::double_ % ','

Where the last option omits the comma, but aside from being quite excessive, X3 refuses to map this onto the result struct.

rubenvb
  • 74,642
  • 33
  • 187
  • 332

1 Answers1

4

I'd model the grammar simpler, top-down and to match the AST.

Simplifying the AST types:

namespace AST {
    enum class shape       { ellipse, circle                  } ;
    enum class other_shape { square, rectangle                } ;
    enum class position    { top, left, right, bottom, center } ;

    using any_shape = std::variant<shape, other_shape, std::string>;
    using data = std::vector<double>;

    struct result {
        any_shape bla;
        position  pos;
        data      bloe;
    };
}

BOOST_FUSION_ADAPT_STRUCT(AST::result, bla, pos, bloe)

I'd write the parser as:

auto const data = as<AST::data>(double_ % ',');
auto const position = kw("at") >> position_sym;

auto const custom_shape =
        !(position|data) >> kw(as<std::string>(+identchar));
auto const any_shape = as<AST::any_shape>(
        ikw(shape_sym) | ikw(other_shape_sym) | custom_shape);

auto const shape_line = as<AST::result>(
        -any_shape >> -position >> (','|&EOL) >> -data);
auto const shapes     = skip(blank) [ shape_line % eol ];

This is using a few helper shorthand functions as you know I often do:

////////////////
// helpers - attribute coercion
template <typename T>
auto as  = [](auto p) {
    return rule<struct _, T> {typeid(T).name()} = p;
};

// keyword boundary detection
auto identchar = alnum | char_("-_.");
auto kw  = [](auto p) { return lexeme[p >> !identchar]; };
auto ikw = [](auto p) { return no_case[kw(p)]; };

auto const EOL = eol|eoi;

This lands you in a a better spot already than your current reported situation:

Live On Coliru

 ==== "circle at center, 1, 2, 3"
Parsed 1 shapes
shape:circle at center, 1, 2, 3
 ==== "at top, 1, 2, 3"
Parsed 1 shapes
shape:ellipse at top, 1, 2, 3
 ==== "at bottom"
Parsed 1 shapes
shape:ellipse at bottom
 ==== "1, 2, 3"
Parse failed
Remaining unparsed input: "1, 2, 3"
 ==== "my_fancy_shape at right, 1"
Parsed 1 shapes
custom:"my_fancy_shape" at right, 1
 ==== "circle at center, 1, 2, 3
               at top, 1, 2, 3
               at bottom
               circle, 1, 2, 3
               1, 2, 3
               my_fancy_shape at right, 1"
Parsed 4 shapes
shape:circle at center, 1, 2, 3
shape:ellipse at top, 1, 2, 3
shape:ellipse at bottom
shape:circle at top, 1, 2, 3
Remaining unparsed input: "
               1, 2, 3
               my_fancy_shape at right, 1"
 ==== "circle, 1, 2 3"
Parsed 1 shapes
shape:circle at top, 1, 2
Remaining unparsed input: " 3"
 ==== ", 1, 2, 3"
Parsed 1 shapes
shape:ellipse at top, 1, 2, 3
 ==== "circle 1, 2, 3"
Parse failed
Remaining unparsed input: "circle 1, 2, 3"

As you see the last three fail to parse the full input, as they're supposed to. However, there's one that you'd like to succeed, which doesn't:

 ==== "1, 2, 3"
Parse failed
Remaining unparsed input: "1, 2, 3"

HACKING

This is tricky to get out of without writing an explosion of parsers. Notice that the trick to get ',' parsing correctly between shape position and data was ','|&EOL.

What we'd actually need to be able to write is &BOL|','|&EOL. But there is no such thing as BOL. Let's emulate it!

// hack for BOL state
struct state_t {
    bool at_bol = true;

    struct set_bol {
        template <typename Ctx> void operator()(Ctx& ctx) const {
            auto& s = get<state_t>(ctx);
            //std::clog << std::boolalpha << "set_bol (from " << s.at_bol << ")" << std::endl;
            s.at_bol = true;
        }
    };

    struct reset_bol {
        template <typename Ctx> void operator()(Ctx& ctx) const {
            auto& s = get<state_t>(ctx);
            //std::clog << std::boolalpha << "reset_bol (from " << s.at_bol << ")" << std::endl;
            s.at_bol = false;
        }
    };

    struct is_at_bol {
        template <typename Ctx> void operator()(Ctx& ctx) const {
            auto& s = get<state_t>(ctx);
            //std::clog << std::boolalpha << "is_at_bol (" << s.at_bol << ")" << std::endl;
            _pass(ctx) = s.at_bol;
        }
    };
};
auto const SET_BOL   = eps[ state_t::set_bol{} ];
auto const RESET_BOL = eps[ state_t::reset_bol{} ];
auto const AT_BOL    = eps[ state_t::is_at_bol{} ];

Now we can mix in the appropriate epsilons here and there:

template <typename T>
auto opt = [](auto p, T defval = {}) {
    return as<T>(p >> RESET_BOL | attr(defval));
};

auto const shape_line = as<AST::result>(
        with<state_t>(state_t{}) [
            SET_BOL >>
            opt<AST::any_shape>(any_shape) >>
            opt<AST::position>(position) >>
            (AT_BOL|','|&EOL) >> -data
        ]);

It's ugly, but it works:

 ==== "circle at center, 1, 2, 3"
Parsed 1 shapes
shape:circle at center, 1, 2, 3
 ==== "at top, 1, 2, 3"
Parsed 1 shapes
shape:ellipse at top, 1, 2, 3
 ==== "at bottom"
Parsed 1 shapes
shape:ellipse at bottom
 ==== "1, 2, 3"
Parsed 1 shapes
shape:ellipse at top, 1, 2, 3
 ==== "my_fancy_shape at right, 1"
Parsed 1 shapes
custom:"my_fancy_shape" at right, 1
 ==== "circle at center, 1, 2, 3
               at top, 1, 2, 3
               at bottom
               circle, 1, 2, 3
               1, 2, 3
               my_fancy_shape at right, 1"
Parsed 6 shapes
shape:circle at center, 1, 2, 3
shape:ellipse at top, 1, 2, 3
shape:ellipse at bottom
shape:circle at top, 1, 2, 3
shape:ellipse at top, 1, 2, 3
custom:"my_fancy_shape" at right, 1
 ==== "circle, 1, 2 3"
Parsed 1 shapes
shape:circle at top, 1, 2
Remaining unparsed input: " 3"
 ==== ", 1, 2, 3"
Parsed 1 shapes
shape:ellipse at top
Remaining unparsed input: ", 1, 2, 3"
 ==== "circle 1, 2, 3"
Parse failed
Remaining unparsed input: "circle 1, 2, 3"

Oh, you might add eoi to the shapes parser rule so we get slightly less confusing output when partial input is parsed, but that's up to you to decide

Full Demo

Live On Wandbox¹

//#define BOOST_SPIRIT_X3_DEBUG
#include <boost/config/warning_disable.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted.hpp>

#include <iostream>
#include <iomanip>
#include <variant>

namespace AST {
    enum class shape       { ellipse, circle                  } ;
    enum class other_shape { square, rectangle                } ;
    enum class position    { top, left, right, bottom, center } ;

    using any_shape = std::variant<shape, other_shape, std::string>;
    using data = std::vector<double>;

    struct result {
        any_shape bla;
        position  pos;
        data      bloe;
    };

    static inline std::ostream& operator<<(std::ostream& os, shape const& v) {
        switch(v) {
            case shape::circle:  return os << "circle";
            case shape::ellipse: return os << "ellipse";
        }
        throw std::domain_error("shape");
    }
    static inline std::ostream& operator<<(std::ostream& os, other_shape const& v) {
        switch(v) {
            case other_shape::rectangle: return os << "rectangle";
            case other_shape::square:    return os << "square";

        }
        throw std::domain_error("other_shape");
    }
    static inline std::ostream& operator<<(std::ostream& os, position const& v) {
        switch(v) {
            case position::top:    return os << "top";
            case position::left:   return os << "left";
            case position::right:  return os << "right";
            case position::bottom: return os << "bottom";
            case position::center: return os << "center";

        }
        throw std::domain_error("position");
    }

    template <typename... F> struct overloads : F... {
        overloads(F... f) : F(f)... {}
        using F::operator()...;
    };

    static inline std::ostream& operator<<(std::ostream& os, any_shape const& v) {
        std::visit(overloads{
            [&os](shape v)       { os << "shape:" << v;               },
            [&os](other_shape v) { os << "other_shape:" << v;         },
            [&os](auto const& v) { os << "custom:" << std::quoted(v); },
        }, v);
        return os;
    }
}

BOOST_FUSION_ADAPT_STRUCT(AST::result, bla, pos, bloe)

namespace parser {
    using namespace boost::spirit::x3;

    struct shape_t : symbols<AST::shape> {
        shape_t() { add
            ("ellipse", AST::shape::ellipse)
            ("circle", AST::shape::circle)
            ;
        }
    } shape_sym;

    struct other_shape_t : symbols<AST::other_shape> {
        other_shape_t() { add
            ("square", AST::other_shape::square)
            ("rectangle", AST::other_shape::rectangle)
            ;
        }
    } other_shape_sym;

    struct position_t : symbols<AST::position> {
        position_t() { add
            ("top", AST::position::top)
            ("left", AST::position::left)
            ("right", AST::position::right)
            ("bottom", AST::position::bottom)
            ("center", AST::position::center)
            ;
        }
    } position_sym;

    // hack for BOL state
    struct state_t {
        bool at_bol = true;

        struct set_bol {
            template <typename Ctx> void operator()(Ctx& ctx) const {
                auto& s = get<state_t>(ctx);
                //std::clog << std::boolalpha << "set_bol (from " << s.at_bol << ")" << std::endl;
                s.at_bol = true;
            }
        };

        struct reset_bol {
            template <typename Ctx> void operator()(Ctx& ctx) const {
                auto& s = get<state_t>(ctx);
                //std::clog << std::boolalpha << "reset_bol (from " << s.at_bol << ")" << std::endl;
                s.at_bol = false;
            }
        };

        struct is_at_bol {
            template <typename Ctx> void operator()(Ctx& ctx) const {
                auto& s = get<state_t>(ctx);
                //std::clog << std::boolalpha << "is_at_bol (" << s.at_bol << ")" << std::endl;
                _pass(ctx) = s.at_bol;
            }
        };
    };
    auto const SET_BOL   = eps[ state_t::set_bol{} ];
    auto const RESET_BOL = eps[ state_t::reset_bol{} ];
    auto const AT_BOL    = eps[ state_t::is_at_bol{} ];

    ////////////////
    // helpers - attribute coercion
    template <typename T>
    auto as  = [](auto p) {
        return rule<struct _, T, true> {typeid(T).name()} = p;
    };
    template <typename T>
    auto opt = [](auto p, T defval = {}) {
        return as<T>(p >> RESET_BOL | attr(defval));
    };

    // keyword boundary detection
    auto identchar = alnum | char_("-_.");
    auto kw  = [](auto p) { return lexeme[p >> !identchar]; };
    auto ikw = [](auto p) { return no_case[kw(p)]; };

    auto const EOL = eol|eoi;
    ////////////////

    auto const data = as<AST::data>(double_ % ',');
    auto const position = kw("at") >> position_sym;

    auto const custom_shape =
            !(position|data) >> as<std::string>(kw(+identchar));
    auto const any_shape = as<AST::any_shape>(
            ikw(shape_sym) | ikw(other_shape_sym) | custom_shape);

    auto const shape_line = as<AST::result>(
            with<state_t>(state_t{}) [
                SET_BOL >>
                opt<AST::any_shape>(any_shape) >>
                opt<AST::position>(position) >>
                (AT_BOL|','|&EOL) >> -data
            ]);
    auto const shapes = skip(blank) [ shape_line % eol ]/* >> eoi*/;
}

int main() {
    for (std::string const input : {
            "circle at center, 1, 2, 3",
            "at top, 1, 2, 3",
            "at bottom",
            "1, 2, 3",
            "my_fancy_shape at right, 1",
            R"(circle at center, 1, 2, 3
               at top, 1, 2, 3
               at bottom
               circle, 1, 2, 3
               1, 2, 3
               my_fancy_shape at right, 1)",

            // invalids:
            "circle, 1, 2 3",
            ", 1, 2, 3",
            "circle 1, 2, 3",
            })
    {
        std::cout << " ==== " << std::quoted(input) << std::endl;
        std::vector<AST::result> r;
        auto f = begin(input), l = end(input);
        if (parse(f, l, parser::shapes, r)) {
            std::cout << "Parsed " << r.size() << " shapes" << std::endl;
            for (auto const& s : r) {
                std::cout << s.bla << " at " << s.pos;
                for (auto v : s.bloe)
                    std::cout << ", " << v;
                std::cout << std::endl;
            }
        } else {
            std::cout << "Parse failed" << std::endl;
        }

        if (f!=l) {
            std::cout << "Remaining unparsed input: " << std::quoted(std::string(f,l)) << std::endl;
        }
    }
}

¹ Wandbox has a more recent version of Boost than Coliru, making with<> directive states mutable as intended.

sehe
  • 374,641
  • 47
  • 450
  • 633
  • Thanks for the exapnsive answer. It'll take me some time to parse it. I'm also glad it is indeed hard to get it correct with that stupid conditionally required comma. – rubenvb Sep 07 '19 at 14:49
  • FYI; this is what I need to parse CSS's [`radial-gradient`](https://developer.mozilla.org/en-US/docs/Web/CSS/radial-gradient). Current, but sloppy version can be found [here](https://github.com/skui-org/skui/blob/6c79124afd/css/grammar/radial_gradient.h%2B%2B). I just find it quite baffling that doing the same thing for the `linear-gradient` seems so trivial (see [here](https://github.com/skui-org/skui/blob/6c79124/css/grammar/linear_gradient.h%2B%2B). This is of course due to the relative simplicity of the first part which is a lot more complex in the radial gradient case. – rubenvb Sep 07 '19 at 14:50
  • The key is that the "first part" can be empty. I think you can untie the knot a little making this problem more like the other. If you can change the AST and combine `shape` and `position` into a unit: http://coliru.stacked-crooked.com/a/83f453e2f6885f6d. This is ugly in the sense that it bends the AST for technical reasons, but it does simplify grammar, no need for the stateful hack like before and no unreasonable explosion in the parser. (I would love to have `operator||` back from Qi) – sehe Sep 08 '19 at 13:05
  • 1
    Small update: I managed to get the leading and missing comma test cases working as I wanted by small modification of the parser](https://github.com/skui-org/skui/blob/41f0672/css/grammar/radial_gradient.h%2B%2B). I also somewhat tried to improve the representation of [the result](https://github.com/skui-org/skui/blob/41f067/css/property/radial_gradient.h%2B%2B). I thought about using the `SET_BOL`/`AT_BOL` trick (I thought of something like that but lacked the skill to quickly prototype it), but I'm more satisfied with the current result. I might discover other issues later though... – rubenvb Sep 12 '19 at 18:30
  • 1
    Thanks again for the extensive reply! – rubenvb Sep 12 '19 at 18:30