How to write binary operator with two post operands syntax with Boost Spirit x3?

Question

I am following this example: https://github.com/boostorg/spirit/blob/develop/example/x3/calc/calc9/expression_def.hpp

What I am trying to accomplish is to write a rule that parses and generates like min{x}{y}. Mostly the code is using expression grammar like x + y, but now I want to place and parse both operands to the rhs of the operator.

I added the following code in expression_def.hpp file:

    ...
    x3::symbols<ast::optoken> additive_op;
    x3::symbols<ast::optoken> multiplicative_op;
    x3::symbols<ast::optoken> binarypost_op;
    x3::symbols<ast::optoken> unary_op;
    x3::symbols<> keywords;
    ...

    binarypost_op.add
        ("min", ast::op_divide) // Dummy operation usage for now
        ;
    ...
    struct binarypost_expr_class;
    struct unary_expr_class; 
    ...
    typedef x3::rule<binarypost_expr_class, ast::expression> 
    binarypost_expr_type;
    ...
    binarypost_expr_type const binarypost_expr = "binarypost_expr";
    ... 

    auto const multiplicative_expr_def =
    binarypost_expr
    >> *(multiplicative_op > binarypost_expr)
    ;
    auto const binarypost_expr_def =           // See the chaining operation
    ('{' > unary_expr > '}')
    >> *(binarypost_op > ('{' > unary_expr > '}'))
    ;
    auto const unary_expr_def =
        primary_expr
    |   (unary_op > primary_expr)
    ;

This works fine. But it can only parse something like , {x} min {y}. I want to be able to parse min {x} {y}. I tried the many combinations such as :

binarypost_op >> ('{' > unary_expr > '}') > ('{' > unary_expr > '}') etc. But I cant seem to figure it out as to what is the right way to write this? Any suggestions / comments ?

Why the funny braces? It makes the grammar inconsistent for no apparent reason, making it hard to integrate. Also, what is the intended precedence rule? Would you envision a prefix operator with more than 2 arguments? Why not `min(x,y)` and `min(x,y,z)`, and `max(a)` at the same time? Consistency and flexibility. — sehe, May 26 '17 at 10:03
I could introduce all operators to require braces that would make it more consistent, however, lets just say it is needed only by a single operator for now. The braces I am using for a grander scheme in the future. As for the precedence, since it could also be a custom function I would like it close to being the current position I have placed it in - just above unary_expr. Thanks for your comment and interest. — Syed Alam Abbas, May 26 '17 at 14:14
I feel it would be very error prone at that level. I'll see whether I can make it work as a function call like `min(a,b)` — sehe, May 26 '17 at 14:16

sehe · Accepted Answer · 2017-05-26T15:50:32.937

Ok, here's the changes. The hard part is actually code-generating the builtin function.

Parsing

Step 1: extend AST

Always start with the AST. We want operands that can be function calls:

In ast.hpp:

struct function_call;  // ADDED LINE

// ...

struct operand :
    x3::variant<
        nil
      , unsigned int
      , variable
      , x3::forward_ast<unary>
      , x3::forward_ast<expression>
      , x3::forward_ast<function_call> // ADDED LINE
    >
{
    using base_type::base_type;
    using base_type::operator=;
};

// ...

enum funtoken
{
    fun_min,
    fun_max,
};

// ...

struct function_call : x3::position_tagged
{
    funtoken fun;
    std::list<operand> args;
};

In ast_adapted.hpp:

BOOST_FUSION_ADAPT_STRUCT(client::ast::function_call,
    fun, args
)

Step 2: extend grammar

(This is all in expression_def.hpp)

Let's be generic, so parse function name tokens using a symbol table:

x3::symbols<ast::funtoken> functions;

Which we have to initialize in add_keywords:

functions.add
    ("min", ast::fun_min)
    ("max", ast::fun_max)
    ;

Now declare a rule for function calls:

struct function_call_class;
typedef x3::rule<function_call_class, ast::function_call>    function_call_type;
function_call_type const function_call = "function_call";

That's all red-tape. The "interesting thing" is the rule definition:

auto const function_call_def =
        functions
    >>  '(' >> expression % ',' >> ')'
    ;

Well. That's underwhelming. Let's integrate into our primary expression rule:

auto const primary_expr_def =
        uint_
    |   bool_
    |   function_call
    |   (!keywords >> identifier)
    |   ('(' > expression > ')')
    ;

Note the ordering. If you want to be able to add function names that collide with a keyword, you'll need to add precautions.

Also, lets make AST annotation work for our node:

struct function_call_class : x3::annotate_on_success {};

Code generation

It's easy to find where to add support for the new AST node:

In compiler.hpp:

 bool operator()(ast::function_call const& x) const;

Now comes the hard part.

What's really required for general n-ary is an accumulator. Since we don't have registers, this would need to be a temporary (local). However, since the VM implementation doesn't have these, I've limited the implementation to a fixed binary function call only.

Note that the VM already has support for function calls. Functions can have locals. So, if you code-gen a variable-argument built-in function you can implement a left-fold recursive solution.

In compiler.cpp:

bool compiler::operator()(ast::function_call const& x) const
{
    auto choice = [&](int opcode) {
        BOOST_ASSERT(x.args.size() == 2); // TODO FIXME hardcoded binary builtin
        auto it = x.args.begin();

        auto& a = *it++;
        if (!boost::apply_visitor(*this, a))
            return false;

        auto& b = *it++;
        if (!boost::apply_visitor(*this, b))
            return false;
        program.op(opcode); // the binary fold operation

        program.op(op_jump_if, 0);
        size_t const branch = program.size()-1;

        if (!boost::apply_visitor(*this, a))
            return false;
        program.op(op_jump, 0);
        std::size_t continue_ = program.size()-1;

        program[branch] = int(program.size()-branch);
        if (!boost::apply_visitor(*this, b))
            return false;

        program[continue_] = int(program.size()-continue_);
        return true;
    };

    switch (x.fun) {
        case ast::fun_min: return choice(op_lt);
        case ast::fun_max: return choice(op_gt);
        default: BOOST_ASSERT(0); return false;
    }
    return true;
}

I've just taken inspiration from the surrounding code on how to generate the jump labels.

Trying It Out

A simplistic example would be: var x = min(1,3);

Assembler----------------

local       x, @0
start:
      op_stk_adj  1
      op_int      1
      op_int      3
      op_lt
      op_jump_if  13
      op_int      1
      op_jump     15
13:
      op_int      3
15:
      op_store    x
end:
-------------------------
Results------------------

    x: 1
-------------------------

Running it with some random contrived input:

./test <<< "var a=$(($RANDOM % 100)); var

b=$(($RANDOM % 100)); var contrived=min(max(27,2*a), 100+b);"

Prints e.g.:

Assembler----------------

local       a, @0
local       b, @1
local       contrived, @2
start:
      op_stk_adj  3
      op_int      31
      op_store    a
      op_int      71
      op_store    b
      op_int      27
      op_int      2
      op_load     a
      op_mul
      op_gt
      op_jump_if  24
      op_int      27
      op_jump     29
24:
      op_int      2
      op_load     a
      op_mul
29:
      op_int      100
      op_load     b
      op_add
      op_lt
      op_jump_if  58
      op_int      27
      op_int      2
      op_load     a
      op_mul
      op_gt
      op_jump_if  51
      op_int      27
      op_jump     56
51:
      op_int      2
      op_load     a
      op_mul
56:
      op_jump     63
58:
      op_int      100
      op_load     b
      op_add
63:
      op_store    contrived
end:
-------------------------
Results------------------

    a: 31
    b: 71
    contrived: 62
-------------------------

Added code+Makefile in https://gist.github.com/sehe/bbf7c39ee0d5d4994189f45f6d08697c — sehe, May 26 '17 at 15:48
I made only minor changes, since I couldn't compile "bool compiler::operator()(ast::function_call const& x) const" function. 1. auto const function_call_def = functions >> '{' >> expression % "}{" >> '}' ;2 switch (x.fun) { case ast::fun_min: program.op(fun_min); break;case ast::fun_max: program.op(fun_max); break; case ast::fun_frac: program.op(op_div); break; default: BOOST_ASSERT(0); return false; } return true; 3. case fun_min: --stack_ptr; if (!bool(stack_ptr[-1] <= stack_ptr[0])) { stack_ptr[-1] = stack_ptr[0]; } break; — Syed Alam Abbas, May 26 '17 at 16:53
Thank you very much. I will tell you the reason I was using curly brackets. When you format in latex, you write the division operation like this : \frac { x} { y }. I wanted my parser to have the same format and then be able to evaluate using the same parser. Now I can do the following: a = \frac { 44 } { 11 } and I can get the desired result = 4; About the fixing of arguments to 2 I saw that the whole vm only operates on two indices of stack -1 and 0, so I think that whole thing may have to change to accommodate more arguments. Thanks for your help and effort. — Syed Alam Abbas, May 26 '17 at 17:01
It's a stack. Indices are incidental. In fact, local variables are also on the stack, and they are definitely not SP-1/SP-2. Stack machines are pretty wellknown and versatile, and does not need adjustment for more arguments (look at the RPN expression for the contrived example, there's no problem there). What is needed is a good paradigm for temporary variables. The only thing needed there is an address translation that uses the frameptr (that already exists) but with downward offsets. The current mechanism is restricted to named & declared variables, which doesn't allow codegen dynamic temps — sehe, May 26 '17 at 22:15
If you want to parse latex, just use that grammar. It's not very useful to start with a completely different grammar and then "subvert" it. When you wrote a comment _"// Dummy operation usage for now"_ you might have guessed that it was hacking. Now, if you didn't really want min, but only `\frac` then, by all means you can use it as a prefix-operator alias for the division operation. However, keep in mind that it does not magically make the VM aware of rational numbers. — sehe, May 26 '17 at 22:17
You are right. I will gradually fully "subvert" it, but this is just a start. Also, the grammar for latex formatting need not be very consistent as we would desire from a theoretical standpoint. For instance, we write multiplication between two variables in latex like this: x \times y - just like x * y grammar, then when it comes to division we write, \frac {x} {y} not at all like x / y. This gives you pretty formatting- very useful for higher math expressions (like partial derivatives etc) in latex where it truly shines. So subversion in this context may be a matter of perspective. — Syed Alam Abbas, May 27 '17 at 00:53
I said nothing about LateX. It's fine! It's just not an expression grammar. Instead it's a markup language. All I'm saying is that the old grammar from calc9 just gets in the way, and so does the AST. Can you tell us what the real goal is? Right now this seems like [an X/Y problem](https://meta.stackexchange.com/a/66378/159703) to me. — sehe, May 27 '17 at 01:31
I thought of not adding more comments here, but since you had asked this and I saw this today I am replying. The real goal is at this point just try to do all the calc functionality using the latex like grammar which as you are right that conflicts with calc9 grammar with is very simple and easy to understand. Anyways thanks for your help and interest. — Syed Alam Abbas, May 28 '17 at 20:29