How to parse tokens separated by whitespace in C++ preprocessor?

Question

I'm doing some preprocessor meta-programming and I need a way to convert f(a b) to g(a,b) in C++ preprocessor. Since a and b are two tokens in C++, it seems possible to find a way to separate them. However, after hours of work there is still no solution to this problem.

Any third-party library including boost::preprocessor is welcomed given that these libraries work in preprocessing process.

Moreover, is there any way to separate arbitrary tokens? e.g const T& should be converted to const, T, &

`const T&` is already three tokens. How could they be more separated? The only vaguely whitespace-aware operation in the preprocessor is the stringify operator (`#`) and once that's done, whitespace is no longer relevant at all. — rici, Oct 29 '15 at 18:09
Also: are a and b arbitrary tokens, or do they come from a pre-defined set of possibilities? — Charles Ofria, Oct 29 '15 at 20:06
@CharlesOfria Actually I want to implement a function template generator : `GEN(f, (int a,_), void, ({ return a;}))` should be expanded to `template auto f (int a, T0) -> void { return a; }`. For a simple `_` my pattern-matching code works well, but it cannot process parameter list like `(int a, const _)`, therefore I'm looking for a way to separate `_` from `const _`. — lz96, Oct 30 '15 at 03:15
`#define a a,` would do it, but that's not likely what you're looking for ;) — Potatoswatter, Nov 08 '15 at 14:03

KamilCuk · Answer 1 · 2023-04-06T16:40:03.597

This is not possible.

For a limited number of tokens, it is possible, but you have to have a dictionary of all possible tokens you want to handle. The whole process was used in Replace spaces with underscores in a macro? and here I will rewrite it without looking just for fun.

Generally, you can concatenate the front of the list with a prefix like WORD_##__VA_ARGS__ where __VA_ARGS__="a b c". Then having #define WORD_a a, you can separate the leading word from the rest. Repeating such process for every token, you can create a list of separate tokens.

// dictionary of all words
#define WORD_
#define WORD_a a,
#define WORD_b b,
#define WORD_c c,
#define WORD_d d,

#define TOKENIZE_1(a)      WORD_##a
#define TOKENIZE_2(a,...)  a, TOKENIZE_1(__VA_ARGS__)
#define TOKENIZE_3(a,...)  a, TOKENIZE_2(__VA_ARGS__)
#define TOKENIZE_4(a,...)  a, TOKENIZE_3(__VA_ARGS__)
#define TOKENIZE_5(a,...)  a, TOKENIZE_4(__VA_ARGS__)
#define TOKENIZE_N(_5,_4,_3,_2,_1,N,...)  TOKENIZE##N
#define TOKENIZE(...)  TOKENIZE_N(__VA_ARGS__,_5,_4,_3,_2,_1)(__VA_ARGS__)

#define REMOVELAST_1(a)
#define REMOVELAST_2(a,...)  a
#define REMOVELAST_3(a,...)  a, REMOVELAST_2(__VA_ARGS__)
#define REMOVELAST_4(a,...)  a, REMOVELAST_3(__VA_ARGS__)
#define REMOVELAST_5(a,...)  a, REMOVELAST_4(__VA_ARGS__)
#define REMOVELAST_N(_5,_4,_3,_2,_1,N,...) REMOVELAST##N
#define REMOVELAST(...)  REMOVELAST_N(__VA_ARGS__,_5,_4,_3,_2,_1)(__VA_ARGS__)

#define SPACES_TO_ARGS(...)  REMOVELAST(TOKENIZE(TOKENIZE(TOKENIZE(TOKENIZE(__VA_ARGS__)))))

#define f(spaceargs)  g(SPACES_TO_ARGS(spaceargs))

f(a b) to g(a, b)
f(a b c) to g(a, b, c)
f(a b c d) to g(a, b, c, d)

Moreover, is there any way to separate arbitrary tokens?

No, that is not possible.

Any third-party library including boost::preprocessor is welcomed given that these libraries work in preprocessing process.

I used BOOST_PP_WHILE to make it a bit more abstract for any number of arguments (up to boost limits).

#include <boost/preprocessor/variadic/to_seq.hpp>
#include <boost/preprocessor/control/while.hpp>
#include <boost/preprocessor/seq/elem.hpp>
#include <boost/preprocessor/seq/size.hpp>
#include <boost/preprocessor/seq/to_tuple.hpp>
#include <boost/preprocessor/seq/pop_back.hpp>
#include <boost/preprocessor/seq/for_each.hpp>
#include <boost/preprocessor/facilities/is_empty.hpp>
#include <boost/preprocessor/logical/not.hpp>

// dictionary of all words
#define WORD_
#define WORD_a a,
#define WORD_b b,
#define WORD_c c,
#define WORD_d d,

#define ADDWORD2(a)       WORD_##a
#define ADDWORD(r, _, a)  ADDWORD2(a)
#define TOKENIZE_OP(d, list)  \
        BOOST_PP_VARIADIC_TO_SEQ(BOOST_PP_SEQ_FOR_EACH(ADDWORD, _, list))
#define SEQ_LAST(state) \
        BOOST_PP_SEQ_ELEM(BOOST_PP_DEC(BOOST_PP_SEQ_SIZE(state)), state)
#define TOKENIZE_PRED(d, state) \
        BOOST_PP_NOT(BOOST_PP_IS_EMPTY(SEQ_LAST(state)))
#define SPACES_TO_ARGS(...) \
        BOOST_PP_SEQ_TO_TUPLE(BOOST_PP_SEQ_POP_BACK(  \
            BOOST_PP_WHILE(TOKENIZE_PRED, TOKENIZE_OP, (__VA_ARGS__))  \
        ))
#define f(spaceargs)  g SPACES_TO_ARGS(spaceargs)

f(a b)
f(a b c)
f(a b c d)

How to parse tokens separated by whitespace in C++ preprocessor?

1 Answers1