0

I want to make a macro in rust to generate a custom syntax array in order to use in a match:

The match at it's baseline:

    match &self.text[self.offset..] {
        [b'f', b'o', b'r', b' ', ..] => {
            Some(CToken::new(CTokenKind::For, self.line, self.column))
        }
        _ => None,
    }

I would like the macro to generate the slice for the match branch, as follows:

    match &self.text[self.offset..] {
        mm!(b"for") => {
            Some(CToken::new(CTokenKind::For, self.line, self.column))
        }
        _ => None,
    }

I would like this macro because the keywords can get very long and it reduces a lot the readability of the code.

I have tried to implement a macro but can't get it right.

I succeeded to generate the array however my macro takes u8 elements and not a whole string:

   macro_rules! mm {
      ($($ch:literal), *) => {
          [$($ch,)* b' ', ..]
      }
   }

Using this macro I can use my macro as follows:

    mm!(b'f', b'o', b'r') => ...

However it does not changes anything, so I would like the macro to take a whole b"my string here"

Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77
Matrix22
  • 3
  • 4
  • Assuming this intended to create a parser, you probably want to have a lexer (tokenizer, scanner). This will simplify this code, and also remove bugs (e.g. this code doesn't consider any whitespace as separator, which you probably want). – Chayim Friedman Jul 29 '23 at 19:15
  • Hi, thanks for the advicd, the code provided in the snippet is from the lexer, i also have a funtion that skip spaces newlines, tabs cariage return characters, and it is called before this function that generates the next token – Matrix22 Jul 30 '23 at 19:36

2 Answers2

1

You could create a procedural macro to do this, but match expressions of this form don't generate good assembly anyway. It's better to use starts_with.

match slice {
    s if s.starts_with(b"for ") => true,
    _ => false,
}

Unfortunately, a single macro can't expand into both a pattern and an if-guard, so something that looks like mm!("for") => true can't do this. However, you can put the entire match inside a macro, which is nicer when this kind of pattern is all you need in the match.

macro_rules! slice_match {
    ($slice:ident {
        $($pat:literal => $e:expr,)*
        _ => $else:expr $(,)?
    }) => {
        match $slice {
            $(s if s.starts_with($pat.as_bytes()) => $e,)*
            _ => $else,
        }
    };

    ($slice:ident {
        $($pat:literal => $e:expr,)*
        $i:ident => $else:expr $(,)?
    }) => {
        match $slice {
            $(s if s.starts_with($pat.as_bytes()) => $e,)*
            $i => $else,
        }
    };
}

// With _ as catch-all
slice_match!(slice {
    "for " => true,
    _ => false,
})

// With identifier as catch-all
slice_match!(slice {
    "for " => true,
    _ident => false,
})

The two variations of the macro only differ in whether the catch-all pattern is an identifier or the _ pattern. It also requires you to include the space in each string.

This macro requires every branch to end with a comma, even if it's in brackets, which is different from real match expressions. You could also change this to allow other kinds of branches, but you'll need to introduce some kind of signal (such as a keyword or symbol) to indicate which branches will transform into starts_with and which are left unchanged.

drewtato
  • 6,783
  • 1
  • 12
  • 17
  • Hi, thank you for the solution, indeed it solves my problem, firstly i was using `starts_with`, however thought that this kind of pattern matching is more performant, could you point me to some referances, where I can learn more about this ? Thank you again for your response – Matrix22 Jul 29 '23 at 10:20
  • I've mostly learned this from experience, but you can compare them yourself https://godbolt.org/z/f7Yd8jsW5 For larger cases, there's phf. This is a benchmark of phf vs match https://github.com/lmammino/mega-match-vs-phf – drewtato Jul 29 '23 at 18:35
1

To expand on the procedural macro solution, it's relatively simple to create one using the quote and syn crates:

/*
[lib]
proc-macro = true

[dependencies]
quote = "1.0.32"
syn = "2.0.27"
*/

use proc_macro::{TokenStream};
use quote::quote;
use syn::{parse_macro_input, LitByteStr};

#[proc_macro]
pub fn mm(input: TokenStream) -> TokenStream {
    let bytes = parse_macro_input!(input as LitByteStr).value();
    quote!([#( #bytes, )* b' ', ..]).into()
}

This way, mm!(b"for") => ... works as expected, and you get the exhaustiveness and duplicate-pattern checking provided by the compiler. However, it requires the macro to be defined in a separate crate, and it noticeably increases compile time, so it may be preferable to use a more lightweight solution.

LegionMammal978
  • 680
  • 8
  • 15