1

In a foo_def.hpp file for a Boost.Spirit X3 project, I have parsers:

auto const identifier_component_unrestricted =
    lexeme[(alpha | '_') >> *(alnum | '_')];

auto const identifier_component_def =
    ((identifier_component_unrestricted - reserved_words) |
    lexeme['"' >> identifier_component_unrestricted >> '"']);

The identifier_component is parsed as a variant but then collapses to a single std::string.

How can I automatically convert the parsed identifier_component into ALL CAPS when it is unquoted (first type in the variant) but keep the case untouched when it is quoted (second type in the variant)?

I've tried using semantic actions but haven't succeeded in getting something that works/compiles.


Edit: Thanks to rmawatson for the following solution.

Add file to_upper.hpp:

#pragma once

#include <boost/algorithm/string.hpp>

namespace parser {

using namespace boost::spirit::x3;

template <typename Subject>
struct ToUpperDirective : unary_parser<Subject, ToUpperDirective<Subject>> {
  using base_type = unary_parser<Subject, ToUpperDirective<Subject>>;
  using attribute_type = typename extension::as_parser<Subject>::value_type;
  static bool const has_attribute = true;
  using subject_type = Subject;

  ToUpperDirective(Subject const& subject) : base_type(subject) {}

  template <typename Iterator, typename Context, typename RContext,
            typename Attribute>
  bool parse(Iterator& first,
             Iterator const& last,
             Context const& context,
             RContext& rcontext,
             Attribute& attr) const {
    auto result = this->subject.parse(first, last, context, rcontext, attr);
    boost::to_upper(attr);
    return result;
  }
};

struct ToUpper {
  template <typename Subject>
  ToUpperDirective<typename extension::as_parser<Subject>::value_type>
      operator[](Subject const& subject) const {
    return {as_parser(subject)};
  }
};

ToUpper const to_upper;

}  // namespace parser

In the original foo_def.hpp just add the #include "to_upper.hpp" and:

// Convert unquoted identifier_components to upper case; keep quoted unchanged.
auto const identifier_component_def =
    to_upper[identifier_component_unrestricted - reserved_words] |
    lexeme['"' >> identifier_component_unrestricted >> '"'];
Matt
  • 20,108
  • 1
  • 57
  • 70

1 Answers1

1

Both of these could just have a std::string attribute, without the need for the variant.

I think the easiest way is probably to create your own all_caps directive and just wrap the quoted alternative in this.

Something like..

template <typename Subject>
struct all_caps_directive : x3::unary_parser<Subject, all_caps_directive<Subject>>
{
    using base_type = x3::unary_parser<Subject, all_caps_directive<Subject> >;
    using attribute_type = typename x3::extension::as_parser<Subject>::value_type;
    static bool const has_attribute = true;
    using subject_type = Subject;

    all_caps_directive(Subject const& subject)
        : base_type(subject) {}

    template <typename Iterator, typename Context, typename RContext,typename Attribute>
    bool parse(Iterator& first, Iterator const& last
        , Context const& context, RContext& rcontext, Attribute& attr) const
    {
        auto result = this->subject.parse(first, last, context, rcontext, attr);
        boost::to_upper(attr);
        return result;
    }
};

struct all_caps_gen
{
    template <typename Subject>
    all_caps_directive<typename x3::extension::as_parser<Subject>::value_type>
        operator[](Subject const& subject) const
    {
        return { as_parser(subject) };
    }
};

auto const all_caps = all_caps_gen{};

Then use it like

auto const identifier_component_def =
    (identifier_component_unrestricted |
    all_caps[lexeme['"' >> identifier_component_unrestricted >> '"']]);

Demo

In response to your comment for something simpler, here is a semantic action version. I think this is less clear and not quite as nice myself.

 auto all_caps = []( auto &ctx )
        {
            boost::to_upper( x3::_attr(ctx));
            x3::_val(ctx) = x3::_attr(ctx);
        };

and use like..

auto const identifier_component_def =
    (identifier_component_unrestricted |
    lexeme['"' >> identifier_component_unrestricted >> '"'][all_caps]);

Demo

rmawatson
  • 1,909
  • 12
  • 20
  • +1 Thank you, this works perfectly! Gonna wait a bit before accepting an answer in case there is a simpler way to do this. – Matt Nov 11 '19 at 16:48
  • semantic action would be 'simpler'. Not as nice imo though. I'll add an example – rmawatson Nov 11 '19 at 17:14
  • In the 2nd demo, removing the quotes `std::string str = R"(Hello)";` gives a blank output. But I think I see your point as to why using semantic actions is not as nice - it is because it requires `identifier_component_unrestricted` to be an `x3::rule`, whereas previously it didn't need to be? – Matt Nov 11 '19 at 17:54
  • Semantic actions 'take over' the attribute, and the automatic attribute propagation stuff stops working - which is what you're seeing above. So you'd need a semantic action on the other alternative to manually propagate its attribute too. https://wandbox.org/permlink/dKk6Kzlq2Oo8d3ZP – rmawatson Nov 11 '19 at 18:04
  • (prev comment url should be https://wandbox.org/permlink/z1yFCc945Iuvpexf) "But I think I see your point as to why using semantic actions is not as nice - it is because it requires identifier_component_unrestricted to be an x3::rule" - My reason for saying it wasn't as nice was really more down the to attribute propagation issues. But you're right, it doesn't work if its not a rule with the semantic action case because the attribute type is some intermediate boost::fusion sequence, not a std::string. – rmawatson Nov 11 '19 at 18:15
  • Here is another alternative where you don't have the first rule defined, but it is effectively created inline with the "as" lambda function. All starting to get rather messy to get the same result as the directive. https://wandbox.org/permlink/zZuOSqgU0chwfXHR – rmawatson Nov 11 '19 at 18:25
  • I see the wisdom of your original suggestion. It is a bit daunting at first due to how much code surrounds the essential call to `boost::to_upper()` but I suppose this is the standard way to add new `x3` directives. Thanks again! – Matt Nov 12 '19 at 23:27