3

I am trying to find all the parameteres-values from a string with the following form:

pN  stands for the Nth parameter: it can be composed of the following chars:
    letters, numbers, and any char included in kSuportedNamesCharsRegEx
vNX for the the Xnt component of the value of the Nth parameter
    vNX accepts arithmetical expressions. Therefore I have constructed kSuportedValuesCharsRegEx. Additionally, it could allow simple/nested list as the value.

Here is an example of the string to be parsed

p1 p2 =   (v21 +  v22)   p3=v31-v32    p4  p5=v5

where I should obtain "p1", "p2 = (v21 + v22)", "p3=v31-v32", "p4", "p5=v5"

As it can be seen, the parameters may have or may not have a value. I am using c++ boost libraries (so I think I don't have available look behind). Till now, I onlye had to deal with parameters which have value, so I have been using the following:

static const std::string kSpecialCharsRegEx = "\\.\\{\\}\\(\\)\\\\\\*\\-\\+\\?\\|\\^\\$";
static const std::string kSuportedNamesCharsRegEx = "[A-Za-z0-9çÇñÑáÁéÉíÍóÓúÚ@%_:;,<>/"
    + kSpecialCharsRegEx + "]+";
static const std::string kSuportedValuesCharsRegEx   = "([\\s\"A-Za-z0-9çÇñÑáÁéÉíÍóÓúÚ@%_:;,<>/"
    + kSpecialCharsRegEx + "]|(==)|(>=)|(<=))+";
static const std::string kSimpleListRegEx    = "\\[" + kSuportedValuesCharsRegEx + "\\]";
static const std::string kDeepListRegEx  = "\\[(" + kSuportedValuesCharsRegEx + "|(" + kSimpleListRegEx + "))+\\]";
// Main idea
//static const std::string stackRegex = "\\w+\\s*=\\s*[\\w\\s]+(?=\\s+\\w+=)"
//          "|\\w+\\s*=\\s*[\\w\\s]+(?!\\w+=)"
//          "|\\w+\\s*=\\s*\\[[\\w\\s]+\\]";
// + deep listing support

    // Main regex
static const std::string kParameterRegEx = 
    + "\\b" + kSuportedNamesCharsRegEx + "\\b\\s*=\\s*" + kSuportedValuesCharsRegEx + "(?=\\s+\\b" + kSuportedNamesCharsRegEx + "\\b=)"
    + "|"
    + "\\b" + kSuportedNamesCharsRegEx + "\\b\\s*=\\s*" + kSuportedValuesCharsRegEx +"(?!" + kSuportedNamesCharsRegEx + "=)"
    + "|"
    + "\\b" + kSuportedNamesCharsRegEx + "\\b\\s*=\\s*(" + kDeepListRegEx + ")";

However, now that I need to deal with non-valued parameters, I am having troubles creating the correct regex.

Could someone help me with this problem? Thanks in advance

  • There are too many unknowns in your question. Please show more examples and show exactly what results you expect. Also, Stack Overflow is not a free code design and writing service. You need to show some effort into doing the work yourself before you can expect any assistance from us. As it stands your question is likely to be voted down and closed as not showing enough research on your part. – AdrianHHH May 28 '14 at 08:15
  • Thanks, @AdrianHHH, I will try to improve my question. – Fernando García Redondo May 28 '14 at 08:19
  • If you can define a grammar maybe [boost::spirit](http://www.boost.org/doc/libs/1_55_0/libs/spirit/doc/html/index.html) can help you. They have an calculator example which is quite similar to your needs. – mkaes May 28 '14 at 08:22
  • @mkaes I agree. A simplistic grammar could be what I posted as an answer – sehe May 28 '14 at 20:19

2 Answers2

2

Like mkaes suggested, you just need to devise a simple grammar here. Here's the Spirit approach:

op         = char_("-+/*");

name       = +(graph - '='); // excluding `op` is not even necessary here

simple     = +(graph - op);

expression = raw [
             '(' >> expression >> ')'
            | simple >> *(op >> expression)
            ];

value      = expression;

definition = name >> - ('=' > value);
start      = *definition;

See it Live On Coliru.

The raw[] is there so we can ignore the whole expression structure for the purpose of tokenization/validation. I've simply accepted everything non-whitespace for names, except operator characters.

Use it like:

int main()
{
    using It = std::string::const_iterator;
    std::string const input = "p1 p2 =   (v21 +  v22)   p3=v31-v32    p4  p5=v5";
    It first(input.begin()), last(input.end());

    Definitions defs;
    if (qi::phrase_parse(first, last, grammar<It>(), qi::space, defs))
    {
        std::cout << "Parsed " << defs.size() << " definitions\n";
        for (auto const& def : defs)
        {
            std::cout << def.name;
            if (def.value)
                std::cout << " with value expression '" << *def.value << "'\n";
            else
                std::cout << " with no value expression\n";
        }
    } else
    {
        std::cout << "Parse failed\n";
    }

    if (first != last)
        std::cout << "Remaining unparsed input: '" << std::string(first,last) << "'\n";
}

Prints:

Parsed 5 definitions
p1 with no value expression
p2 with value expression '(v21 +  v22)'
p3 with value expression 'v31-v32'
p4 with no value expression
p5 with value expression 'v5'

Full Code for reference

#include <boost/fusion/adapted/struct.hpp>
#include <boost/spirit/include/qi.hpp>

namespace qi = boost::spirit::qi;

struct Definition {
    std::string name;
    boost::optional<std::string> value;
};

BOOST_FUSION_ADAPT_STRUCT(Definition, (std::string, name)(boost::optional<std::string>, value))

using Definitions = std::vector<Definition>;

template <typename Iterator, typename Skipper = qi::space_type>
struct grammar : qi::grammar<Iterator, Definitions(), Skipper>
{
    grammar() : grammar::base_type(start) {
        using namespace qi;

        name       = +(graph - '=');

        simple     = name;

        expression = raw [
                '(' >> expression >> ')'
              | simple >> *(char_("+-/*") >> expression)
              ];

        value      = expression;

        definition = name >> - ('=' > value);
        start      = *definition;
    }
  private:
    qi::rule<Iterator> simple;
    qi::rule<Iterator, std::string(), Skipper> expression, value;
    qi::rule<Iterator, std::string()/*no skipper*/> name;
    qi::rule<Iterator, Definition(),  Skipper> definition;
    qi::rule<Iterator, Definitions(), Skipper> start;
};

int main()
{
    using It = std::string::const_iterator;
    std::string const input = "p1 p2 =   (v21 +  v22)   p3=v31-v32    p4  p5=v5";
    It f(input.begin()), l(input.end());

    Definitions defs;
    if (qi::phrase_parse(f, l, grammar<It>(), qi::space, defs))
    {
        std::cout << "Parsed " << defs.size() << " definitions\n";
        for (auto const& def : defs)
        {
            std::cout << def.name;
            if (def.value)
                std::cout << " with value expression '" << *def.value << "'\n";
            else
                std::cout << " with no value expression\n";
        }
    } else
    {
        std::cout << "Parse failed\n";
    }

    if (f != l)
        std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
sehe
  • 374,641
  • 47
  • 450
  • 633
0

I think I found the solution to the problem. Working together with my workmate.

The main idea is contained in the following example: http://regexr.com/38tjv

Regex:

(?:^|\s)(\b[a-zA-Z0-9]+\b|\b[a-zA-Z0-9]+\b\s*=\s*\b[a-zA-Z0-9\s\+\(\)]+?\b)(?=\s+\b[a-zA-Z0-9]+\b\s*=|\s*$|\s+\b[a-zA-Z0-9]+\b)

And here is the explanation:

    static const std::string kParameterRegEx = "(?:^|\\s)"                                                  // starts string or space before, not catched
        + "("                                                                                               // group of the parameter or parameter-value
            + "\\b" + kSuportedNamesCharsRegEx + "\\b"                                                      //      simple names
            + "|"                                                                                           //      or
            + "\\b" + kSuportedNamesCharsRegEx + "\\b\\s*=\\s*\\b" + kSuportedValuesCharsRegEx + "?\\b"     //      name-value
        + ")"                                                                                               // end group
        + "(?="                                                                                             // followed by group of
            + "\\s+\\b" + kSuportedNamesCharsRegEx + "\\b\\s*="                                             //      new parameter with value
            + "|"                                                                                           //      or
            + "\\s*$"                                                                                       //      end of string
            + "\\s+\\b" + kSuportedNamesCharsRegEx + "\\b"                                                  //      new parameter without value
        + ")";                                                                                              // end of following group

I hope it helps to other people who need to parse Cadence Spectre circuits.