Is there any way of accessing the tokens from a token tree or expression in rust (without stringifing and having to parse)

Question

So my question is two-fold, I'm trying to make a computer algebra system similar to SymPy or Math.NET symbolics

my idea is to use some sort of macro to allow this syntax:

let symbolic!(fn function(x, a) -> 2/4*x^2 + a*x + 4)
function.derive(x) // x + 1

What I'm looking for is a way to access the tokens from the token tree so that x, a becomes Symbolic::Symbol("x"), Symbolic::Symbol("a") and 2/4, 4 become Symbolic::Rational(2,4) Symbolic::Rational(4,1), i can then use operator overloading to construct the abstract syntax tree, : eg :

impl Add<Self> for Symbolic {
    type Output = Node;
    fn add(self, rhs: Self) -> Self::Output {
        use Node::*;
        BinaryExpr { op: '+', lhs: Box::new(Symb(self)), rhs: Box::new(Symb(rhs)) }
    }
}

where the enumsSymbolic and Node are:

pub enum Symbolic{
    Symbol(String),
    Rational(isize, usize),
    Operator(char)
}

pub enum Node{
    Symb(Symbolic),
    UnaryExpr{
        op: char,
        child: Box<Node>
    },
    BinaryExpr{
        op: char,
        lhs: Box<Node>,
        rhs: Box<Node>
    }

}

What should be the actual Rust code emitted by the macro invocation you have there? — PitaJ, Nov 30 '22 at 21:34
Do you only intend to support Rust operators with the same order of operations as Rust? — PitaJ, Nov 30 '22 at 21:37
good point, I had not considered this, I may need to stringify the expression after all — Lyndon Alcock, Nov 30 '22 at 21:52
You may not need to stringify the expression, but you will at least need to use a proc macro. — PitaJ, Nov 30 '22 at 21:55
I was intending on having the macro emit either the Node tree or some wrapper around the node tree? — Lyndon Alcock, Nov 30 '22 at 21:55
Proc macros run rust code at compile time. Unlike `macro_rules!` declarative macros, they can do pretty much anything. https://doc.rust-lang.org/reference/procedural-macros.html — PitaJ, Nov 30 '22 at 22:05
[Here's a version using declarative macros](https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=e1b7f2716c9a2d1cd38029600a798a2a). It works if you accept the limited number of Rust operators and the order of operations that come with them — PitaJ, Nov 30 '22 at 22:19
And here I've modified it a bit to make exponentiation with `^` highest priority: https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=778dec2f6204f2877d4f7fffc2e3ec7c — PitaJ, Nov 30 '22 at 23:09
How did you modify it, I was wondering if I could try creating my own operator of sorts for example if I was to use pythons ```**``` notation — Lyndon Alcock, Nov 30 '22 at 23:23
I won't lie it will likely take some time for me to understand the code you've written. Thank you so much by the way — Lyndon Alcock, Nov 30 '22 at 23:24
It is confusing, but if that kind of macro will work for you, I can submit an answer explaining more of how it works. — PitaJ, Dec 01 '22 at 03:10
That'd be great, as far as I can tell this is what I was looking for — Lyndon Alcock, Dec 01 '22 at 11:07

PitaJ · Accepted Answer · 2022-12-01T22:47:55.333

There is certainly a way to accomplish this using a declarative macro_rules! macro.

Here is what I came up with:

/// Macro for declaring a symbolic `Function`
///
/// # Example
///
/// ```
/// let function = symbolic!(fn (a, b) -> a + b^2);
/// let functione = Function {
///     parameters: vec![String::from("a"), String::from("b")],
///     expression: Node::Symb(Symbolic::Symbol(String::from("a"))) + (Node::Symb(Symbolic::Symbol(String::from("b"))) ^ Node::Symb(Symbolic::Rational(2, 1))),
/// };
/// assert_eq!(function, functione);
/// ```
macro_rules! symbolic {
    // Main entry point
    ( fn ($($params:ident),* $(,)*) -> $($expression:tt)* ) => {
        Function {
            // Extract parameters
            parameters: vec![
                $( String::from(stringify!($params)), )*
            ],
            // Pass expression to tt muncher
            // Starting with an empty accumulator
            expression: symbolic!(@munch ( $($expression)* ) -> ()),
        }
    };
    // Handle exponentiation with ^ as highest priority
    //
    // Capture the left (base) and right (exponent) sides as raw token trees,
    // which helpfully captures parenthesized expressions
    ( @munch ($base:tt^$exponent:tt $($rest:tt)*) -> ($($accum:tt)*) ) => {
        // Pass the rest of the expression to continue parsing recursively
        symbolic!(@munch ( $($rest)* ) -> (
            // Append the exponentiation wrapped in parens to the accumulator
            $($accum)*
            // Parse the base and exponent as expressions
            (symbolic!(@munch ($base) -> ()) ^ symbolic!(@munch ($exponent) -> ())) 
        ))
    };
    // Handle parenthesized expressions directly
    // 
    // Unwrap the parenthesis
    ( @munch (($($expression:tt)*) $($rest:tt)*) -> ($($accum:tt)*) ) => {
        // Pass the rest of the expression to continue parsing recursively
        symbolic!(@munch ( $($rest)* ) -> (
            // Append the expression parse invocation to the accumulator
            $($accum)*
            // Parse the inner expression
            // This is wrapped in parens by the final munch case
            symbolic!(@munch ( $($expression)* ) -> ())
        ))
    };
    // Handle division of two literal integers as a single rational
    //
    // Capture the left (numerator) and right (denominator) sides,
    // and pass them through as a rational
    ( @munch ($numerator:literal/$denominator:literal $($rest:tt)*) -> ($($accum:tt)*) ) => {
        // Pass the rest of the expression to continue parsing recursively
        symbolic!(@munch ( $($rest)* ) -> (
            // Append the rational to the accumulator
            $($accum)*
            Node::Symb(Symbolic::Rational($numerator, $denominator))
        ))
    };
    // Handle a single literal number as a rational with denominator of 1
    ( @munch ($num:literal $($rest:tt)*) -> ($($accum:tt)*) ) => {
        // Pass the rest of the expression to continue parsing recursively
        symbolic!(@munch ( $($rest)* ) -> (
            // Append the rational to the accumulator
            $($accum)*
            Node::Symb(Symbolic::Rational($num, 1))
        ))
    };
    // Handle a parameter name
    ( @munch ($param:ident $($rest:tt)*) -> ($($accum:tt)*) ) => {
        // Pass the rest of the expression to continue parsing recursively
        symbolic!(@munch ( $($rest)* ) -> (
            // Append the parameter symbol to the accumulator
            $($accum)*
            Node::Symb(Symbolic::Symbol(String::from(stringify!($param))))
        ))
    };
    // Pass through operators directly
    //
    // For better ergonomics, you may want to handle each operator separately,
    // as this will allow literally any token through
    ( @munch ($op:tt $($rest:tt)*) -> ($($accum:tt)*) ) => {
        symbolic!(@munch ( $($rest)* ) -> ( $($accum)* $op ))
    };
    // Handle the final output when all tokens have been handled
    ( @munch () -> ($($accum:tt)*) ) => {
        // Just return the final accumulated Rust expression
        ( $($accum)* )
    };
}

This macro is a lot. The first thing you might pick out is a lot of patterns that look like this:

( @munch ($first:thing $($rest:tt)*) -> ($($accum:tt)*) )

This is a combination of two patterns:

I highly recommend reading through that macro book, especially those pages. Here is a quick explanation.

Incremental TT munchers allow the declarative macro to handle a token stream bit by bit. You give a pattern for the bit you want matched, and then the $($rest:tt)* pattern matches any sequence of tokens after it. By passing through $($rest)* to a subsequent invocation, the whole token sequence can be handled.

Push-down Accumulation is way of working around the fact that a macro must always produce a valid item. Essentially, macros can't return something like 1, 2 because that's not a valid expression. Instead, we have to pass through an accumulated value and append to it with each subsequent macro invocation.

However, I don't think that macro with overloading operators is really what would serve you best. I think a better way would be to have a simpler macro that just converts tokens as a kind of lexer, and then parse the output of that yourself.

That way, you could completely control operator precedence, etc.

Here's a macro that will do the simpler lexing instead:

fn parse_symbolic_expression(symbols: Vec<Symbolic>) -> Node {
    todo!()
}

macro_rules! symbolic2 {
    ( fn ($($params:ident),* $(,)?) -> $($expression:tt)* ) => {
        Function {
            // Extract parameters
            parameters: vec![
                $( String::from(stringify!($params)), )*
            ],
            expression: parse_symbolic_expression(
                symbolic2!(@munch ( $($expression)* ) -> [])
            ),
        }
    };

    ( @munch ($num:literal $($rest:tt)*) -> [$($accum:tt)*] ) => {
        symbolic2!(@munch ( $($rest)* ) -> [ $($accum)* Symbolic::Integer($num), ])
    };
    ( @munch ($param:ident $($rest:tt)*) -> [$($accum:tt)*] ) => {
        symbolic2!(@munch ( $($rest)* ) -> [ $($accum)* Symbolic::Symbol(String::from(stringify!($param))), ])
    };
    ( @munch (($($expression:tt)*) $($rest:tt)*) -> [$($accum:tt)*] ) => {
        symbolic2!(@munch ( $($rest)* ) -> [ $($accum)* Symbolic::Parenthesized(symbolic2!(@munch ( $($expression)* ) -> [])), ])
    };
    ( @munch (+ $($rest:tt)*) -> [$($accum:tt)*] ) => {
        symbolic2!(@munch ( $($rest)* ) -> [ $($accum)* Symbolic::BinaryOperator('+'), ])
    };
    ( @munch (- $($rest:tt)*) -> [$($accum:tt)*] ) => {
        symbolic2!(@munch ( $($rest)* ) -> [ $($accum)* Symbolic::BinaryOperator('-'), ])
    };
    ( @munch (/ $($rest:tt)*) -> [$($accum:tt)*] ) => {
        symbolic2!(@munch ( $($rest)* ) -> [ $($accum)* Symbolic::BinaryOperator('/'), ])
    };
    ( @munch (* $($rest:tt)*) -> [$($accum:tt)*] ) => {
        symbolic2!(@munch ( $($rest)* ) -> [ $($accum)* Symbolic::BinaryOperator('*'), ])
    };
    ( @munch (^ $($rest:tt)*) -> [$($accum:tt)*] ) => {
        symbolic2!(@munch ( $($rest)* ) -> [ $($accum)* Symbolic::BinaryOperator('^'), ])
    };
    ( @munch () -> [$($accum:tt)*] ) => {
        vec![ $($accum)* ]
    };
}

Hi Peter, sorry bout the late response (busy week), I was wondering if it would be good to use the macro for the parameters to take reference to the previously established values, for example If I declare some symbolic ```rust symbolic!{ let x; f(x) -> x^2 //... g(x) -> x+2 //... } ``` i would like these two x's to be the same so that the equation could be solved simultaneously, i was wondering what your advice is on this, would this require proc macros? thanks from Lyndon — Lyndon Alcock, Dec 04 '22 at 15:58
What do you mean by "the same"? I'm struggling to understand why you can't just check for equality. — PitaJ, Dec 04 '22 at 19:06
I may need to restructure as i am reevaluating my data model but if i have some expression a*x^2 + b*x + c and d*x^2 + e*x + f with my new data model these would be expressed as one big sum of 3 different expressions ``` pub enum Node<'a>{ Symbolic(Token<'a>), Expression{ op: Token<'a>, buf: Vec<&'a Self> } } ``` so that if i want an easy way of accessing just the coefficients of the x's i could simply set the symbolic X to be 1 and without simplifing the buf for the coefficients would provid all the data I need to create a matrix of the coefficients — Lyndon Alcock, Dec 04 '22 at 19:46
I'm sure it can be done, but why not instead just have a function that you pass an array of functions and the variable name? — PitaJ, Dec 05 '22 at 00:56
Just out of interest @Pitaj how did you discover this push down accumulator technique, I've been working on the proc-macro equivalent of this (see https://stackoverflow.com/questions/74688174/how-to-deal-convert-recurrsive-block-types-in-rust-macro-to-rpn-also-an-error-q/74689121#74689121) , and it feels much more readable and way easier to work with in my opinion however it is a very interesting solution and I was wondering if there is any advantages or disadvantages of using the declarative macros? — Lyndon Alcock, Dec 05 '22 at 14:50
I learned from the [book I linked in the answer](https://danielkeep.github.io/tlborm/book/). As for declarative vs procedural macros: generally, if it can be done with a declarative macro, you should use a declarative macro. This is because procedural macros have a large impact on compilation time, because they are full Rust programs that must be compiled before they can be called during compilation of your actual code. — PitaJ, Dec 05 '22 at 17:14
ah thank you for the insight, I'm, hoping my proc macro does not affect the compile time too much :), because for my use case I am doing this all for an interview so I want a strong enthesis on readability. thank you so much . I was mainly asking to see if there was a thought process behind the googling because that feels like maybe a niche design pattern, super cool though :) cant beat a plain old interest in the subject — Lyndon Alcock, Dec 05 '22 at 18:02

Is there any way of accessing the tokens from a token tree or expression in rust (without stringifing and having to parse)

1 Answers1

Linked