Case Insensitive String Comparison of Boost::Spirit Token Text in Semantic Action

Question

I've got a tokeniser and a parser. the parser has a special token type, KEYWORD, for keywords (there are ~50). In my parser I want to ensure that the tokens are what I'd expect, so I've got rules for each. Like so:

KW_A = tok.KEYWORDS[_pass = (_1 == "A")];
KW_B = tok.KEYWORDS[_pass = (_1 == "B")];
KW_C = tok.KEYWORDS[_pass = (_1 == "C")];

This works well enough, but it's not case insensitive (and the grammar I'm trying to handle is!). I'd like to use boost::iequals, but attempts to convert _1 to an std::string result in the following error:

error: no viable conversion from 'const _1_type' (aka 'const actor<argument<0> >') to 'std::string' (aka 'basic_string<char>')

How can I treat these keywords as strings and ensure they're the expected text irrespective of case?

Liam M · Answer 1 · 2014-12-15T04:43:26.510

A little learning went a long way. I added the following to my lexer:

struct normalise_keyword_impl
{
    template <typename Value>
    struct result
    {
        typedef void type;
    };

    template <typename Value>
    void operator()(Value const& val) const
    {
        // This modifies the original input string.
        typedef boost::iterator_range<std::string::iterator> iterpair_type;
        iterpair_type const& ip = boost::get<iterpair_type>(val);
        std::for_each(ip.begin(), ip.end(),
            [](char& in)
            {
                in = std::toupper(in);
            });
    }
};

    boost::phoenix::function<normalise_keyword_impl> normalise_keyword;

    // The rest...
};

And then used phoenix to bind the action to the keyword token in my constructor, like so:

this->self =
    KEYWORD [normalise_keyword(_val)]
    // The rest...
    ;

Although this accomplishes what I was after, It modifies the original input sequence. Is there some modification I could make so that I could use const_iterator instead of iterator, and avoid modifying my input sequence?

I tried returning an std::string copied from ip.begin() to ip.end() and uppercased using boost::toupper(...), assigning that to _val. Although it compiled and ran, there were clearly some problems with what it was producing:

Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
result is SELECT
Token: 0: KEYWORD ('KEYWOR')
Token: 1: REGULAR_IDENTIFIER ('a')
result is FROM
Token: 0: KEYWORD ('KEYW')
Token: 1: REGULAR_IDENTIFIER ('b')

Very peculiar, it appears I have some more learning to do.

Final Solution

Okay, I ended up using this function:

struct normalise_keyword_impl
{
    template <typename Value>
    struct result
    {
        typedef std::string type;
    };

    template <typename Value>
    std::string operator()(Value const& val) const
    {
        // Copy the token and update the attribute value.
        typedef boost::iterator_range<std::string::const_iterator> iterpair_type;
        iterpair_type const& ip = boost::get<iterpair_type>(val);

        auto result = std::string(ip.begin(), ip.end());
        result = boost::to_upper_copy(result);
        return result;
    }
};

And this semantic action:

KEYWORD [_val = normalise_keyword(_val)]

With (and this sorted things out), a modified token_type:

typedef std::string::const_iterator base_iterator;
typedef boost::spirit::lex::lexertl::token<base_iterator, boost::mpl::vector<std::string> > token_type;
typedef boost::spirit::lex::lexertl::actor_lexer<token_type> lexer_type;
typedef type_system::Tokens<lexer_type> tokens_type;
typedef tokens_type::iterator_type iterator_type;
typedef type_system::Grammar<iterator_type> grammar_type;

// Establish our lexer and our parser.
tokens_type lexer;
grammar_type parser(lexer);

// ...

The important addition being boost::mpl::vector<std::string> >. The result:

Enter a sequence to be tokenised: select a from b
Input is 'select a from b'.
Token: 0: KEYWORD ('SELECT')
Token: 1: REGULAR_IDENTIFIER ('a')
Token: 0: KEYWORD ('FROM')
Token: 1: REGULAR_IDENTIFIER ('b')

I have no idea why this has corrected the problem so if someone could chime in with their expertise, I'm a willing student.

Thanks for posting. How did you get the lexer to recognize a keyword regardless of case? — Felix Dombek, Oct 21 '16 at 11:13
@FelixDombek Hi Felix, in `normalise_keyword_impl`, you'll find a call to `to_upper_copy` which normalises the keyword, making them case insensitive. If I define the keyword SELECT with this semantic action, the word `SeLeCt` will be normalised to `SELECT`. — Liam M, Oct 23 '16 at 08:28
Thanks for answering. I thought the semantic actions were executed *after* the keyword has been matched? So you'll pass the normalised keyword on to the parser, but I was confused as to how you do the initial recognition case-insensitively. I have since found the answer though: `(?i:pattern)` is the regex for case-insensitive matching. — Felix Dombek, Oct 23 '16 at 20:24
@FelixDombek no worries, glad you've found a solution. This was a while ago so I'm a little rusty, you may be correct; I completed and moved on from this project about 12 months ago! — Liam M, Oct 24 '16 at 04:47

Case Insensitive String Comparison of Boost::Spirit Token Text in Semantic Action

1 Answers1

Final Solution