3

Situation:

YAML file containing list of heterogeneous objects by name, like so:

object: Foo
  name: Joe Bloggs
  age: 26
object: Bar
  location: UK

Objects do not inherit from any base class or share any sort of relationship between each other apart from the fact that they appear to "live" together.

This can contain any number of objects. The list of available types can exist in a typelist in the codebase if required.

In my C++ land I have the objects:

struct Foo {
  Foo(std::string n, int a) : name(n), age(a) {}

  std::string name;
  int age;
};

struct Bar {
  Bar(std::string l) : location(l) {}

  std::string location;
};

And on compilation I want to turn that YAML file into a boost::fusion::vector:

boost::fusion::vector<Foo, Bar>(Foo("Joe Bloggs", 26), Bar("UK"));

Or:

boost::fusion::vector<Foo, Bar>(make_obj<Foo>("Joe Bloggs", 26), make_obj<Bar>("UK"));

Can also be a std::tuple if it makes life easier.

Specializations for make_obj can exist for all supported objects if needed.

Is this possible?

Willing to get my hands dirty with the MPL / other advanced metaprogramming if need be, or, can I do all this with constexpr?

C++ version is no worry, can use trunk Clang C++14 if need be.

JasonMArcher
  • 14,195
  • 22
  • 56
  • 52
Sam Kellett
  • 1,277
  • 12
  • 33

1 Answers1

1

I see two main approaches:

With Compiletime "Reflection"

You can use BOOST_FUSION_ADAPT_STRUCT and have your cake and eat it. If you adapt your structs you can statically iterate them - in fact writing that code generator that @πάνταῥεῖ mentioned, but inline with the C++ code and at compile time.

You can have the types statically constrained using a variant.

With Manual Grammar

Using Boost Spirit you can just create a grammar for the same structs:

    start   = *(label_(+"object") >> object_);
    object_ = foo_ | bar_;

    foo_    = "Foo" >> eol >> (
                (string_prop_(+"name") >> eol) ^
                (int_prop_(+"age") >> eol)
            );

    bar_    = "Bar" >> eol >> (
                (string_prop_(+"location") >> eol)
            );

    label_  = lit(_r1) >> ':';

    string_prop_ = label_(_r1) >> lexeme [ *(char_ - eol) ];
    int_prop_    = label_(_r1) >> int_;

Now this parses into variant<Foo, Bar> without any further coding. It even allows name and age to appear in random order (or to accept the default value). Of course if you don't want this flexibility, replace ^ with >> in the grammar.

Here's a sample input:

object: Foo
  name: Joe Bloggs
  age: 26
object: Foo
  age: 42
  name: Douglas Adams
object: Foo
  name: Lego Man
object: Bar
  location: UK

And here's the tail of sample (debug) output:

<success></success>
<attributes>[[[[J, o, e,  , B, l, o, g, g, s], 26], [[D, o, u, g, l, a, s,  , A, d, a, m, s], 42], [[L, e, g, o,  , M, a, n], 0], [[U, K]]]]</attributes>
</start>
Parse success: 4 objects
N4data3FooE (Joe Bloggs 26)
N4data3FooE (Douglas Adams 42)
N4data3FooE (Lego Man 0)
N4data3BarE (UK)

Live On Coliru

#define BOOST_SPIRIT_DEBUG
#define BOOST_SPIRIT_USE_PHOENIX_V3
#include <boost/spirit/include/qi.hpp>
#include <boost/fusion/adapted/struct.hpp>
#include <boost/bind.hpp>
#include <fstream>

namespace qi  = boost::spirit::qi;
namespace phx = boost::phoenix;

namespace demo {
    struct visitor : boost::static_visitor<> {
        template<typename Seq>
            void operator()(std::ostream& os, Seq const& seq) const {
                os << typeid(Seq).name() << "\t" << boost::fusion::as_vector(seq);
            }
    };
}

namespace data {
    struct Foo {
        Foo(std::string n="", int a=0) : name(n), age(a) {}

        std::string name;
        int age;
    };

    struct Bar {
        Bar(std::string l="") : location(l) {}

        std::string location;
    };

    using object  = boost::variant<Foo, Bar>;
    using objects = std::vector<object>;

    std::ostream& operator<< (std::ostream& os, object const& o) {
        boost::apply_visitor(boost::bind(demo::visitor(), boost::ref(os), _1), o);
        return os;
    }
}

BOOST_FUSION_ADAPT_STRUCT(data::Foo,(std::string,name)(int,age))
BOOST_FUSION_ADAPT_STRUCT(data::Bar,(std::string,location))

template <typename It>
struct grammar : qi::grammar<It, data::objects(), qi::blank_type> {
    grammar() : grammar::base_type(start) {
        using namespace qi;

        start   = *(label_(+"object") >> object_);
        object_ = foo_ | bar_;

        foo_    = "Foo" >> eol >> (
                    (string_prop_(+"name") >> eol) ^
                    (int_prop_(+"age") >> eol)
                );

        bar_    = "Bar" >> eol >> (
                    (string_prop_(+"location") >> eol)
                );

        label_  = lit(_r1) >> ':';

        string_prop_ = label_(_r1) >> lexeme [ *(char_ - eol) ];
        int_prop_    = label_(_r1) >> int_;

        BOOST_SPIRIT_DEBUG_NODES((start)(object_)(foo_)(bar_)(label_)(string_prop_)(int_prop_));
    }
  private:
    qi::rule<It, data::objects(), qi::blank_type> start;
    qi::rule<It, data::object(),  qi::blank_type> object_;
    qi::rule<It, data::Foo(),     qi::blank_type> foo_;
    qi::rule<It, data::Bar(),     qi::blank_type> bar_;

    qi::rule<It, std::string(std::string), qi::blank_type> string_prop_;
    qi::rule<It, int(std::string), qi::blank_type>         int_prop_;
    qi::rule<It, void(std::string), qi::blank_type>        label_;
};

int main()
{
    using It = boost::spirit::istream_iterator;
    std::ifstream ifs("input.txt");
    It f(ifs >> std::noskipws), l;

    grammar<It> p;
    data::objects parsed;
    bool ok = qi::phrase_parse(f,l,p,qi::blank,parsed);
    if (ok)
    {
        std::cout << "Parse success: " << parsed.size() << " objects\n";
        for(auto& object : parsed)
            std::cout << object << "\n";
    } else
    {
        std::cout << "Parse failed\n";
    }

    if (f!=l)
        std::cout << "Remaining unparsed input: '" << std::string(f,l) << "'\n";
}
Community
  • 1
  • 1
sehe
  • 374,641
  • 47
  • 450
  • 633
  • This was my first thought too, and I do like it, but I hesitate because ignoring spacing seems very against YAML. – Mooing Duck Dec 18 '14 at 23:10
  • @MooingDuck I have no clue about what YAML is. It's largely irrelevant of course, because it's easy to make the grammar non-whitespace ignoring. I was show casing the concept, not the grammar (that's for the OP I guess) – sehe Dec 18 '14 at 23:10
  • His Input is [YetAnotherMarkupLanguage](http://www.yaml.org/), which uses indentation to tell which entries are "children" of each. Realistically, this probably lexes his particular input just fine. It would just mishandle malformed inputs. – Mooing Duck Dec 18 '14 at 23:14
  • I find that link mildly interesting. I'm pretty sure he wasn't asking about "how to parse YAML" though. So I'll leave it as an exercise to the reader. – sehe Dec 18 '14 at 23:15
  • Yeah, that's fair enough – Mooing Duck Dec 18 '14 at 23:21
  • I've just added a demonstration of how to use the variant though (that might not be obvious). Also, let the record show that whitespace is ignored, but that doesn't include line-ends or whitespace inside string properties. :) – sehe Dec 18 '14 at 23:25