0

I'm trying to parse a URL query string with special rules. So far it works with one exclusion described below URL is parsed as set of key-value pairs using following:

const qi::rule<std::string::const_iterator, std::string()> key = qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9/%\\-_~\\.");
const qi::rule<std::string::const_iterator, std::string()> value = *(qi::char_ - '=' - '&');
const qi::rule<std::string::const_iterator, std::pair<std::string, std::string>()> pair  =  key >> -('=' >> value);
const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *(('&') >> pair);

so far, so good. one of the special cases it that ampersand can be presented in form of XML entity - & so the query rule was upgraded to

const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *((qi::lit("&amp;")|'&') >> pair);

and it worked as expected. Then additional special case appeared - quoted value which can contain unescaped equal signs and ampersands, something in form of a=b&d=e&f=$$g=h&i=j$$&x=y&z=def which should parse into

  • a => b
  • d => e
  • f => g=h&i=j
  • x => y
  • x => def

So I've added additional rule for "quoted" values

const qi::rule<std::string::const_iterator, std::string()> key   =  qi::char_("a-zA-Z_") >> *qi::char_("a-zA-Z_0-9/%\\-_~\\.");
const qi::rule<std::string::const_iterator, std::string()> escapedValue = qi::omit["$$"] >> *(qi::char_ - '$') >> qi::omit["$$"];
const qi::rule<std::string::const_iterator, std::string()> value = *(escapedValue | (qi::char_ - '=' - '&'));
const qi::rule<std::string::const_iterator, std::pair<std::string, std::string>()> pair  =  key >> -('=' >> value);
const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *((qi::lit("&amp;")|'&') >> pair);

which, once again worked as expected until the next case - a=b&d=e&f=$$g=h&i=j$$x=y&z=def, note, there is no ampersand between closing "$$" and next key name. looks like it can be easily solved by adding kleene operator like

const qi::rule<std::string::const_iterator, std::unordered_map<std::string, std::string>()> query =  pair >> *(__*__(qi::lit("&amp;")|'&') >> pair);

but for some reason it does not do the trick. any suggestion will be appreciated!

EDIT: Sample code

#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/std_pair.hpp>
#include <unordered_map>

namespace rulez
{
    using namespace boost::spirit::qi;
    using It = std::string::const_iterator;

    const rule<It, std::string()> key                                    = boost::spirit::qi::char_("a-zA-Z_") >> *boost::spirit::qi::char_("a-zA-Z_0-9/%\\-_~\\.");
    const rule<It, std::string()> escapedValue                           = boost::spirit::qi::omit["$$"] >> *(boost::spirit::qi::char_ - '$') >> boost::spirit::qi::omit["$$"];
    const rule<It, std::string()> value                                  = *(escapedValue | (boost::spirit::qi::char_ - '=' - '&'));
    const rule<It, std::pair<std::string, std::string>()> pair           = key >> -('=' >> value);
    const rule<It, std::unordered_map<std::string, std::string>()> query = pair >> *(*(boost::spirit::qi::lit("&amp;")|'&') >> pair);
}

int main()
{
    using namespace std;
    unordered_map<string, string> keyVal;
  //string const paramString = "a=b&d=e&f=$$g=h&i=j$$&x=y&z=def";
    string const paramString = "a=b&d=e&f=$$g=h&i=j$$x=y&z=def";

    boost::spirit::qi::parse(paramString.begin(), paramString.end(), rulez::query, keyVal);

    for (const auto& pair : keyVal)
        cout << "(\"" << pair.first << "\",\"" << pair.second << "\")" << endl;
}

Output for "a=b&d=e&f=$$g=h&i=j$$x=y&z=def" (erroneous, should be the same as for "a=b&d=e&f=$$g=h&i=j$$&x=y&z=def")

("a", "b"),("d", "e"),("f", "g=h&i=jx")

Output for "a=b&d=e&f=$$g=h&i=j$$&x=y&z=def" (as expected)

("a", "b"),("d", "e"),("f", "g=h&i=j"),("x", "y"),("z", "def")

EDIT: Somewhat simpler parsing rules, just to make stuff easier to understand

namespace rulez
{
    const rule<std::string::const_iterator, std::string()> key =  +(char_ - '&' - '=');
    const rule<std::string::const_iterator, std::string()> escapedValue = omit["$$"] >> *(char_ - '$') >> omit["$$"];
    const rule<std::string::const_iterator, std::string()> value = *(escapedValue | (char_ - '&' - '='));
    const rule<std::string::const_iterator, pair<std::string, std::string>()> pair  =  key >> -('=' >> value);
    const rule<std::string::const_iterator, unordered_map<std::string, std::string>()> query =  pair >> *(*(lit('&')) >> pair);
}
sehe
  • 374,641
  • 47
  • 450
  • 633
kreuzerkrieg
  • 3,009
  • 3
  • 28
  • 59
  • It's easier to get help when you provide a compilable example, something like [this](http://coliru.stacked-crooked.com/a/9702b2ebd5e6535f). I'm not sure I understand your intention, is your expected map `map={{a,b}, {d,e}, {f,g=h&i=j}, {x,y}, {z,def}};`? – llonesmiz Jan 23 '14 at 08:28
  • Your `value` rule should be `escapedValue|"unescapedValue"`, right? Because what you have right now is different. – llonesmiz Jan 23 '14 at 08:56
  • nope, the situation is somewhat more complicated, there are four rules, lets call em "key", "value", "pair" and "query". Actually "value" should look like ("query"|"unquotedvalue"), it creates a kind of nested "query". I use "unquoted" instead of "unescaped" to avoid url escaping confusion – kreuzerkrieg Jan 23 '14 at 09:04
  • I'm sorry, I'm having real trouble understanding what you want to ask. I suggest it could be the SCCEE with the intended output. Preferrably simpler. – sehe Jan 23 '14 at 14:28
  • sehe, yes, I know, re-reading my question I realize that I have a difficulty to define the problem clearly and the question is overloaded with technical details, which (possibly) not too important I will try to define the question once again – kreuzerkrieg Jan 23 '14 at 17:24
  • sehe, consider following: you have to parse a key/value pairs, separated by character, the value may be quoted, the quoted value may be a string which is similar in format to the above key/value pairs, with the same separator, but it shouldn't be parsed since it is quoted. The quoted value may or may not have a separator between closing quote and next key value pair. input -> key1=value1&key2=value2&key3="key4=value4&key5=value5"key6=value6&key7=value7 which parsed into {{key1,value1}, {key2, value2}, {key3, key4=value4&key5=value5},{key6=value6},{key7,value7}}. hope it clarifies things – kreuzerkrieg Jan 23 '14 at 17:35

2 Answers2

1

I would guess your problem is the value rule

value = *(escapedValue | (char_ - '&' - '='));

when parsing …$$g=h&i=j$$x=…

$$g=h&i=j$$x=
^---------^

it parses the marked string $$g=h&i=j$$ as escapedValue, then the kleene operator (*) allows the second part (char_ - '&' - '=') of the value rule to parse the x

$$g=h&i=j$$x=
           ^

and only at the = the rule stops.

Maybe something like this will help:

value = escapedValue | *(char_ - '&' - '=');
mr_georg
  • 3,635
  • 5
  • 35
  • 52
  • don't have an access to the code right now but I have a feeling, that I have to upgrade the escaped rule to something like qi::omit["$$"] >> *(qi::char_ - '$') >> qi::omit["$$"] >> *(pair); – kreuzerkrieg Jan 24 '14 at 15:22
  • so, this answer didn't help? Still searching? – mr_georg Feb 04 '14 at 08:15
  • with a help of friend of mine, which is profound enough in boost spirit I've found the solution, however, I prefer not to use it – kreuzerkrieg Feb 05 '14 at 07:37
0

this solves the problem. however, I've decided to abandon the idea of using spirit for parsing query string - each special case makes the query more and more cumbersome, after a while no one will remember why the query is written in the way it is :)

qi::rule<std::string::const_iterator, std::string()> key =  +(qi::char_ - '=' - '&');
qi::rule<std::string::const_iterator, std::string()> escapedValue = qi::omit["$$"] >> *(qi::char_ - "$$") >> qi::omit["$$"];
qi::rule<std::string::const_iterator, std::string()> nonEscapedValue = !qi::lit("$$") >> *(qi::char_ - '=' - '&');

auto sep = qi::lit("&amp;") | '&';
qi::rule<std::string::const_iterator, std::pair<std::string, boost::optional<std::string>>()> keyValue = 
        key >> -('=' >> nonEscapedValue) >> (sep | qi::eoi);
qi::rule<std::string::const_iterator, std::pair<std::string, boost::optional<std::string>>()> escapedKeyValue =  
        key >> '=' >> escapedValue >> -(sep);
auto query = *(qi::hold[keyValue] | escapedKeyValue);
kreuzerkrieg
  • 3,009
  • 3
  • 28
  • 59