1

Following this link provided by @sehe in this post Boost_option to parse a configuration file, I need to parse configuration files that may have comments.

https://www.boost.org/doc/libs/1_76_0/doc/html/property_tree/parsers.html#property_tree.parsers.info_parser

But since there are comments (leading #), so in addition to read_info(), should a grammer_spirit be used to take out the comments as well? I am referring to info_grammar_spirit.cpp in the /property_tree/examples folder

sehe
  • 374,641
  • 47
  • 450
  • 633
visitor99999
  • 163
  • 1
  • 10
  • What's the code? DO you have a sample of the input? What error is encountered? – sehe May 28 '21 at 00:33
  • I modified the example code in the Boost installed path. I also noticed something weird just now: if I used g++ to compile your code, the output is different than Visual Studio's. For example, your original get_child("Resnet50") would work just fine, no need to modify it. It's a bit confusing from these two compilers – visitor99999 May 28 '21 at 01:00
  • I think there might be a legitimate source of UB in the ranged-for with a temporary. This tends to happen when one uses temporaries (e.g. (`for (char ch : return_a_temporary_foo().member_returns_a_ref()) {}` is UB). When unsure, extract a variable. UBSAN/ASAN didn't flag for GCC so it might be a MSVC bug – sehe May 28 '21 at 01:35

1 Answers1

0

You would do good to avoid depending on implementation details, so instead I'd suggest pre-processing your config file just to strip the comments.

A simple replace of "//" with "; " may be enough.

Building on the previous answer:

std::string tmp;
{
    std::ifstream ifs(file_name.c_str());
    tmp.assign(std::istreambuf_iterator<char>(ifs), {});
} // closes file

boost::algorithm::replace_all(tmp, "//", ";");
std::istringstream preprocessed(tmp);
read_info(preprocessed, pt);

Now if you change the input to include comments:

Resnet50 {
    Layer CONV1 {
        Type: CONV // this is a comment
        Stride { X: 2, Y: 2 }       ; this too
        Dimensions { K: 64, C: 3, R: 7, S: 7, Y:224, X:224 }
    }

    // don't forget the CONV2_1_1 layer
    Layer CONV2_1_1 {
        Type: CONV
        Stride { X: 1, Y: 1 }       
        Dimensions { K: 64, C: 64, R: 1, S: 1, Y: 56, X: 56 }
    }
}

It still parses as expected, if we also extend the debug output to verify:

ptree const& resnet50 = pt.get_child("Resnet50");
for (auto& entry : resnet50) {
    std::cout << entry.first << " " << entry.second.get_value("") << "\n";

    std::cout << " --- Echoing the complete subtree:\n";
    write_info(std::cout, entry.second);
}

Prints

Layer CONV1
 --- Echoing the complete subtree:
Type: CONV
Stride
{
    X: 2,
    Y: 2
}
Dimensions
{
    K: 64,
    C: 3,
    R: 7,
    S: 7,
    Y:224, X:224
}
Layer CONV2_1_1
 --- Echoing the complete subtree:
Type: CONV
Stride
{
    X: 1,
    Y: 1
}
Dimensions
{
    K: 64,
    C: 64,
    R: 1,
    S: 1,
    Y: 56,
    X: 56
}

See it Live On Coliru

Yes, But...?

What if '//' occurs in a string literal? Won't it also get replaced. Yes.

This is not a library-quality solution. You should not expect one, because you didn't have to put in any effort to parse your bespoke configuration file format.

You are the only party who can judge whether the short-comings of this approach are a problem for you.

However, short of just copying and modifying Boost's parser or implementing your own from scratch, there's not a lot one can do.

For The Masochists

If you don't want to reimplement the entire parser, but still want the "smarts" to skip string literals, here's a pre_process function that does all that. This time, it's truly employing Boost Spirit

#include <boost/spirit/home/x3.hpp>
std::string pre_process(std::string const& input) {
    std::string result;
    using namespace boost::spirit::x3;
    auto static string_literal
        = raw[ '"' >> *('\\'>> char_ | ~char_('"')) >> '"' ];

    auto static comment
        = char_(';') >> *~char_("\r\n")
        | "//" >> attr(';') >> *~char_("\r\n")
        | omit["/*" >> *(char_ - "*/") >> "*/"];

    auto static other
        = +(~char_(";\"") - "//" - "/*");

    auto static content
        = *(string_literal | comment | other) >> eoi;

    if (!parse(begin(input), end(input), content, result)) {
        throw std::invalid_argument("pre_process");
    }
    return result;
}

As you can see, it recognizes string literals (with escapes), it treats "//" and ';' style linewise comments as equivalent. To "show off" I threw in /block comments/ which cannot be represented in proper INFO syntax, so we just omit[] them.

Now let's test with a funky example (extended from the "Complicated example demonstrating all INFO features" from the documentation):

#include <boost/property_tree/info_parser.hpp>
#include <iostream>
using boost::property_tree::ptree;

int main() {
    boost::property_tree::ptree pt;
    std::istringstream iss(
            pre_process(R"~~( ; A comment
key1 value1   // Another comment
key2 "value with /* no problem */ special // characters in it {};#\n\t\"\0"
{
   subkey "value split "\
          "over three"\
          "lines"
   {
      a_key_without_value ""
      "a key with special characters in it {};#\n\t\"\0" ""
      "" value    /* Empty key with a value */
      "" /*also empty value: */ ""       ; Empty key with empty value!
   }
})~~"));

    read_info(iss, pt);

    std::cout << " --- Echoing the parsed tree:\n";
    write_info(std::cout, pt);
}

Prints (Live On Coliru)

 --- Echoing the parsed tree:
key1 value1
key2 "value with /* no problem */ special // characters in it {};#\n    \"\0"
{
    subkey "value split over threelines"
    {
        a_key_without_value ""
        "a key with special characters in it {};#\n     \"\0" ""
        "" value
        "" ""
    }
}
sehe
  • 374,641
  • 47
  • 450
  • 633
  • Oh. I misread the question and saw `//` as the comment character. I assume you would be able to make the change to support `#`-comments instead ([like so](http://coliru.stacked-crooked.com/a/fc59fb44af08f61e) for the only complicated case) – sehe May 28 '21 at 01:42
  • Cool, I didn't know how to code the "funky example" in the document. Now I know :-). I don't quite understand boost::spirit part works though. – visitor99999 May 28 '21 at 02:19
  • Nobody really knows :) Spirit has one of the higher learning curves. But once you're proficient with it, it's hard to resist using it over writing tedious manual parsers again :) – sehe May 28 '21 at 02:25
  • Yeah, still fuzzy about Spirit. VS doesn't recognize pt.get_value() for some reason, which is very annoying because g++ didn't have such problem. I will open a separate post just for VS. – visitor99999 May 28 '21 at 15:35