1

I want to parse a string that can contain a '-', but not start nor end with it.

I expected this parser to work:

auto const parser = alnum >> -(*(alnum | char_('-')) >> alnum);

But in my test input "something" it only parses the "so" and doesn't eat the rest.

The trouble is that the middle bit *(alnum | char_('-')) eats all the way to the end (including the last char, so the whole optional bracket fails).

How and why is explained here and here

What I want to know is, how can I get around it and make this parser ?

See it live: http://coliru.stacked-crooked.com/a/833cc2aac7ba5e27

sehe
  • 374,641
  • 47
  • 450
  • 633
matiu
  • 7,469
  • 4
  • 44
  • 48

2 Answers2

1

I'd personally write it "positively":

auto const rule = raw [ lexeme [
    alnum >> *('-' >> alnum | alnum) >> !(alnum|'-') 
] ];

This uses

  • lexeme to handle whitespace significance,
  • raw to avoid having to actively match every character that you want as part of the output (you just want all characters).
  • '-' >> alnum positively mandates that any dash be followed by a alnum. Note this also outlaws "--" in in the input. See VARIANT below

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <string>
#include <algorithm>

namespace x3 = boost::spirit::x3;

namespace parser {
    using namespace boost::spirit::x3;

    auto const rule = raw [ lexeme [
        alnum >> *('-' >> alnum | alnum) >> !(alnum|'-') 
    ] ];
}

int main() {
    struct test { std::string input; bool expected; };

    for (auto const t : {
            test { "some-where", true },
            test { " some-where", true },
            test { "some-where ", true },
            test { "s", true },
            test { " s", true },
            test { "s ", true },
            test { "-", false },
            test { " -", false },
            test { "- ", false },

            test { "some-", false },
            test { " some-", false },
            test { "some- ", false },

            test { "some--where", false },
            test { " some--where", false },
            test { "some--where ", false },
        })
    {
        std::string output;
        bool ok = x3::phrase_parse(t.input.begin(), t.input.end(), parser::rule, x3::space, output);
        if (ok != t.expected)
            std::cout << "FAILURE: '" << t.input << "'\t" << std::boolalpha << ok << "\t'" << output << "'\n";
    }
}

VARIANT

To also allow some--thing and similar inputs, I'd change '-' into +lit('-'):

alnum >> *(+lit('-') >> alnum | alnum) >> !(alnum|'-') 

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <iostream>
#include <string>
#include <algorithm>

namespace x3 = boost::spirit::x3;

namespace parser {
    using namespace boost::spirit::x3;

    auto const rule = raw [ lexeme [
        alnum >> *(+lit('-') >> alnum | alnum) >> !(alnum|'-') 
    ] ];
}

int main() {
    struct test { std::string input; bool expected; };

    for (auto const t : {
            test { "some-where", true },
            test { " some-where", true },
            test { "some-where ", true },
            test { "s", true },
            test { " s", true },
            test { "s ", true },
            test { "-", false },
            test { " -", false },
            test { "- ", false },

            test { "some-", false },
            test { " some-", false },
            test { "some- ", false },

            test { "some--where", true },
            test { " some--where", true },
            test { "some--where ", true },
        })
    {
        std::string output;
        bool ok = x3::phrase_parse(t.input.begin(), t.input.end(), parser::rule, x3::space, output);
        if (ok != t.expected)
            std::cout << "FAILURE: '" << t.input << "'\t" << std::boolalpha << ok << "\t'" << output << "'\n";
    }
}
sehe
  • 374,641
  • 47
  • 450
  • 633
0

I fixed it by telling the parser inside the greedy kleene star to ignore 'eoi' (end of input). A more robust fix would have it fail with whitespace too:

so *(alnum | char_('-')) becomes *((alnum | char_('-')) >> !(eoi | space))

See it live: http://coliru.stacked-crooked.com/a/79242cdbd2fac947

matiu
  • 7,469
  • 4
  • 44
  • 48