A little learning went a long way. I added the following to my lexer:
struct normalise_keyword_impl
{
template <typename Value>
struct result
{
typedef void type;
};
template <typename Value>
void operator()(Value const& val) const
{
// This modifies the original input string.
typedef boost::iterator_range<std::string::iterator> iterpair_type;
iterpair_type const& ip = boost::get<iterpair_type>(val);
std::for_each(ip.begin(), ip.end(),
[](char& in)
{
in = std::toupper(in);
});
}
};
boost::phoenix::function<normalise_keyword_impl> normalise_keyword;
// The rest...
};
And then used phoenix to bind the action to the keyword token in my constructor, like so:
this->self =
KEYWORD [normalise_keyword(_val)]
// The rest...
;
Although this accomplishes what I was after, It modifies the original input sequence. Is there some modification I could make so that I could use const_iterator instead of iterator, and avoid modifying my input sequence?
I tried returning an std::string copied from ip.begin() to ip.end() and uppercased using boost::toupper(...), assigning that to _val. Although it compiled and ran, there were clearly some problems with what it was producing:
Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
result is SELECT
Token: 0: KEYWORD ('KEYWOR')
Token: 1: REGULAR_IDENTIFIER ('a')
result is FROM
Token: 0: KEYWORD ('KEYW')
Token: 1: REGULAR_IDENTIFIER ('b')
Very peculiar, it appears I have some more learning to do.
Final Solution
Okay, I ended up using this function:
struct normalise_keyword_impl
{
template <typename Value>
struct result
{
typedef std::string type;
};
template <typename Value>
std::string operator()(Value const& val) const
{
// Copy the token and update the attribute value.
typedef boost::iterator_range<std::string::const_iterator> iterpair_type;
iterpair_type const& ip = boost::get<iterpair_type>(val);
auto result = std::string(ip.begin(), ip.end());
result = boost::to_upper_copy(result);
return result;
}
};
And this semantic action:
KEYWORD [_val = normalise_keyword(_val)]
With (and this sorted things out), a modified token_type:
typedef std::string::const_iterator base_iterator;
typedef boost::spirit::lex::lexertl::token<base_iterator, boost::mpl::vector<std::string> > token_type;
typedef boost::spirit::lex::lexertl::actor_lexer<token_type> lexer_type;
typedef type_system::Tokens<lexer_type> tokens_type;
typedef tokens_type::iterator_type iterator_type;
typedef type_system::Grammar<iterator_type> grammar_type;
// Establish our lexer and our parser.
tokens_type lexer;
grammar_type parser(lexer);
// ...
The important addition being boost::mpl::vector<std::string> >
. The result:
Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
Token: 0: KEYWORD ('SELECT')
Token: 1: REGULAR_IDENTIFIER ('a')
Token: 0: KEYWORD ('FROM')
Token: 1: REGULAR_IDENTIFIER ('b')
I have no idea why this has corrected the problem so if someone could chime in with their expertise, I'm a willing student.