2

I would like to write a Boost::Spirit::X3 parser to parse complex number with the following possible input format:

  • "(X+Yi)"
  • "Yj"
  • "X"

My best attempt so far is the following (Open on Coliru):

#include <complex>
#include <iostream>

#include <boost/fusion/include/adapt_struct.hpp>
#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/utility/error_reporting.hpp>

namespace x3 = boost::spirit::x3;

struct error_handler {
    template <typename iterator_t, typename error_t, typename context_t>
    auto on_error(iterator_t& /* iter */, const iterator_t& /* end */, const error_t& error,
                  const context_t& context) {
        namespace x3 = boost::spirit::x3;
        const auto& handler = x3::get<x3::error_handler_tag>(context).get();
        handler(error.where(), "error: expecting: " + error.which());
        return x3::error_handler_result::fail;
    }
};

// -----------------------------------------------------------------------------

namespace ast {
template <typename T>
struct complex_number {
    T real;
    T imag;
    operator std::complex<T>() {
        return {real, imag};
    }
};
}  // namespace ast
BOOST_FUSION_ADAPT_STRUCT(ast::complex_number<double>, real, imag);

// -----------------------------------------------------------------------------

namespace parser {
const auto pure_imag_number = x3::attr(0.) > x3::double_ > x3::omit[x3::char_("ij")];
const auto pure_real_number = x3::double_ > x3::attr(0.);

struct complex_class : error_handler {};
const x3::rule<complex_class, ast::complex_number<double>> complex = "Complex number";
static const auto complex_def = ('(' > (x3::double_ > -(x3::double_ > x3::omit[x3::char_("ij")])) >> ')')
                                | pure_imag_number 
                                | pure_real_number;

BOOST_SPIRIT_DEFINE(complex);
}  // namespace parser

// =============================================================================

void parse(const std::string& str) {
    using iterator_t = std::string::const_iterator;

    auto iter = std::begin(str);
    auto end = std::end(str);

    boost::spirit::x3::error_handler<iterator_t> handler(iter, end, std::cerr);
    const auto parser = boost::spirit::x3::with<boost::spirit::x3::error_handler_tag>(
        std::ref(handler))[parser::complex];

    std::complex<double> result{};
    if (boost::spirit::x3::phrase_parse(iter, end, parser, x3::space, result) && iter == end) {
        std::cout << "Parsing successful for:' " << str << "'\n";
    } else {
        std::cout << "Parsing failed for:' " << str << "'\n";
    }
}

int main() {
    for (const auto& str : {
             "(1+2j)",
             "(3+4.5j)",
             "1.23j",
             "42",
         }) {
        parse(str);
    }
    return 0;
}

Which gives the following results when running the compiled code (with GCC 12.1.1 and Boost 1.79.0):

Parsing successful for:' (1+2j)'
Parsing successful for:' (3+4.5j)'
Parsing successful for:' 1.23j'
In line 1:
error: expecting: N5boost6spirit2x314omit_directiveINS1_8char_setINS0_13char_encoding8standardEcEEEE
42
__^_
Parsing failed for:' 42'

What I am puzzled by is why the last alternative is not considered valid when parsing the string with only a real number within it.

wohlstad
  • 12,661
  • 10
  • 26
  • 39
Tachikoma
  • 179
  • 2
  • 9
  • 1
    I think you are running afoul of `a > b` "Match a followed by b. If `a` fails, no-match. If `b` fails, throw an `expectation_failure`". – Eljay Aug 11 '22 at 14:45
  • My Spirit-fu is poor, will this work? `const auto pure_imag_number = x3::attr(0.) >> x3::double_ >> x3::omit[x3::char_("ij")];` – Eljay Aug 11 '22 at 14:53
  • @Eljay That is correct, I think I got too fixated on the need for '>' to enable error handling that I forgot about that fact. Thanks for pointing this out. – Tachikoma Aug 11 '22 at 14:56
  • 1
    Cool! Thank you for a good question. `:-)` I've been interested in Spirit and it's nice to see a real-world example. – Eljay Aug 11 '22 at 15:01

2 Answers2

2

You already found that expectation points are too forcing if you need to allow backtracking.

Beware, though, that your grammar is a bit funny w.r.t. separating the values with only a unary sign included in the double_ parser.

Here's a simplified test that highlights some of the edge cases:

static const auto ij      = x3::omit[x3::char_("ij")];
static const auto implied = x3::attr(0.);

static const auto complex =
    x3::rule<struct complex_, ast::complex_number<double>>{"complex"} //
= ('(' >> x3::double_ >> ((x3::double_ >> ij) | implied) >> ')')      //
    | implied >> x3::double_ >> ij                                    //
    | x3::double_ >> implied;

With the complete test Live On Coliru printing

Parsing successful for: '(1+2j)' -> (1,2)
Parsing successful for: '(1 2j)' -> (1,2)
Parsing successful for: '(+1+2j)' -> (1,2)
Parsing successful for: '(+1-2j)' -> (1,-2)
Parsing successful for: '(-1-2j)' -> (-1,-2)
Parsing successful for: '(3+4.5j)' -> (3,4.5)
Parsing successful for: '1.23j' -> (0,1.23)
Parsing successful for: '42' -> (42,0)
Parsing successful for: 'inf' -> (inf,0)
Parsing successful for: '-infj' -> (0,-inf)
Parsing successful for: 'NaNj' -> (0,nan)
Parsing successful for: '(.0e9)' -> (0,0)
Parsing successful for: '(.0e-4)' -> (0,0)
Parsing successful for: '.0e-4i' -> (0,0)
Parsing successful for: '.0e-4j' -> (0,0)
Parsing successful for: '(3-0.e-4j)' -> (3,-0)
Parsing successful for: '(3-.0e-4j)' -> (3,-0)

Note that allowing whitespace in the non-parenthesized versions can easily lead to problems (ambiguous inputs/surprising misparses). I'd suggest maybe you only want to skip blanks inside parentheses:

static const auto complex =
    x3::rule<struct complex_, ast::complex_number<double>>{"complex"} //
= x3::skip(x3::blank)['(' >> x3::double_ >>
                      ((x3::double_ >> ij) | implied) >> ')'] //
    | x3::lexeme[implied >> x3::double_ >> ij                 //
                 | x3::double_ >> implied];
sehe
  • 374,641
  • 47
  • 450
  • 633
  • You’re right that my grammar is a little funny with the unary sign. Actually, if parentheses are encountered then I expect the form ‘X+Yj’ and nothing else so actually I should have not have had that unary sign to begin with. – Tachikoma Aug 14 '22 at 11:38
  • Good point about disabling skipping whitespaces in the non parentheses versions – Tachikoma Aug 14 '22 at 11:41
1

So, @Eljay's comment is right...

The issue stems from the use of > instead of >> to allow the failures without triggering the error handler upon failure.

So to actually succeed, we need to use >> in these places:

const auto pure_imag_number = x3::attr(0.) >> x3::double_ >> x3::omit[x3::char_("ij")];
const auto pure_real_number = x3::double_ >> x3::attr(0.);

And only use > when we really want to abort immediately and report an error.

Tachikoma
  • 179
  • 2
  • 9