0

I have been refactoring (while learning) some existing hand massaged date parsing routines with Boost Spirit. My main aim has been to try to do a single and unified interface to be able to parse dates represented in multiple different ways. Previously I have had multiple different function signatures to transform various different date formats (such as parseDateYYYYMMDD and parseDateDDMMYYYY, or parseDateStrangeFormatXYZ) to domain specific date objects. With Spirit it seems I can build a master grammar that is able to combine all these formats together and there would be just a single parseDate

Unfortunately when I have been comparing the performance the Boost Spirit implementation with the hand massaged code the performance is around 5 times slower (64-bit build in Release mode) even with the most usual date format YYYY-MM-dd hh:mm:ss.zzz that should not have to do any back tracking. I would hope I could get at least into par with the old code. I would like to receive some feedback is there major room for improvement to improve the grammar (and possible the AST) in such a way that it is optimal for performance. Currently I assemble the date parts to a vector and once parser succeeds I attach a semantic action to construct the Date objects with the components gathered from the vector. Maybe one weak spot is the vector here? Is there possible a lot of allocation/deallocation within the grammar?

My Date objects have to constructors

// Date without time part
Date(Day d, Month m, Year y)
// Date with time part
Date(Day d, Month m, Year y, Size hours, Size minutes, Size seconds)

Here is the snippet to do the parsing (this is in .cpp file)

namespace {

namespace x3 = boost::spirit::x3;

namespace parsers {

    template<typename T>
    auto as = [](auto p) { return x3::rule<struct _, T>{} %= x3::as_parser(p); };

    auto kwd = [](auto p, auto q) { return p > q; };

    static const struct months : x3::symbols<int> {
        months() {
            add
                ("Jan"          , 1)
                ("Feb"          , 2)
                ("Mar"          , 3)
                ("Apr"          , 4)
                ("May"          , 5)
                ("Jun"          , 6)
                ("Jul"          , 7)
                ("Aug"          , 8)
                ("Sep"          , 9)
                ("Oct"          , 10)
                ("Nov"          , 11)
                ("Dec"          , 12)
                ("January"      , 1)
                ("February"     , 2)
                ("March"        , 3)
                ("April"        , 4)
                ("May"          , 5)
                ("June"         , 6)
                ("July"         , 7)
                ("August"       , 8)
                ("September"    , 9)
                ("October"      , 10)
                ("November"     , 11)
                ("December"     , 12)
                ;
        }
    } months;

    static const x3::uint_parser<int, 10, 4, 4> yyyy;
    static const x3::uint_parser<int, 10, 1, 2> MM, dd;
    static const x3::uint_parser<int, 10, 2, 2> H_mm_ss;
    static const x3::uint_parser<int, 10, 3, 3> zzz;

    // ADL markers

    struct ql_date_class                          {};
    struct ql_date_time_class                     {};

    static const auto ql_date       = x3::rule<ql_date_class     , Date>{"ql-date"};
    static const auto ql_date_time  = x3::rule<ql_date_time_class, Date>{"ql-date-time"};

    auto validate_H = [](auto& ctx) { 
        const auto& x = x3::_attr(ctx);
        x3::_pass(ctx) = (x >= 0 && x < 24);
    };

    auto validate_mm_ss = [](auto& ctx) { 
        const auto& x = x3::_attr(ctx);
        x3::_pass(ctx) = (x >= 0 && x < 60);
    };

    auto validate_yyyy = [](auto& ctx) { 
        const auto& x = x3::_attr(ctx);
        x3::_pass(ctx) = (x > 1900 && x < 2200);
    };

    auto validate_MM = [](auto& ctx) { 
        const auto& x = x3::_attr(ctx);
        x3::_pass(ctx) = (x >= 1 && x <= 12);
    };

    auto validate_dd = [](auto& ctx) { 
        const auto& x = x3::_attr(ctx);
        x3::_pass(ctx) = (x >= 1 && x <= 31);
    };

    static const auto year_ = yyyy[validate_yyyy];
    static const auto month_ = as<int>(months | MM[validate_MM]);
    static const auto day_ = dd[validate_dd];

    static const auto hours_ = H_mm_ss[validate_H];
    static const auto minutes_ = H_mm_ss[validate_mm_ss];
    static const auto seconds_ = H_mm_ss[validate_mm_ss];
    static const auto milliseconds_ = zzz;

    auto date_parser = as<std::vector<int>>(
        year_ >
        as<std::vector<int>>(
            kwd('-',  as<std::vector<int>>(month_ >  '-' > day_))
            |
            kwd('.',  as<std::vector<int>>(month_ >  '.' > day_))
            |
            kwd('/',  as<std::vector<int>>(month_ >  '/' > day_))
        )
        |
        day_ >
        as<std::vector<int>>(
            kwd('-',  as<std::vector<int>>(month_ >  '-' > year_))
            |
            kwd('.',  as<std::vector<int>>(month_ >  '.' > year_))
            |
            kwd('/',  as<std::vector<int>>(month_ >  '/' > year_))
        )[([](auto& ctx) { std::swap(x3::_attr(ctx)[0], x3::_attr(ctx)[2]); })]
    )
    ;

    static const auto time_parser = as<std::vector<int>>(
        hours_ >
        as<std::vector<int>>(
            (
                ':' >  minutes_ > 
                as<std::vector<int>>(
                    (
                        ':' > seconds_ >
                        as<int>(
                            '.' > milliseconds_ 
                            |
                            x3::attr(int(0))
                        )
                    )
                    |
                    (
                        x3::repeat(2)[x3::attr(int(0))]
                    )
                )
            )
            |
            (
                minutes_ > 
                as<std::vector<int>>(
                    (
                        seconds_ >
                        as<int>(
                            milliseconds_ 
                            |
                            x3::attr(int(0))
                        )
                    )
                    |
                    (
                        x3::repeat(2)[x3::attr(int(0))]
                    )
                )
            )
        )
    )
    ;

    auto make_ql_date = [](auto& ctx) {
        x3::_val(ctx) = Date(
            x3::_attr(ctx)[2],
            static_cast<Month>(x3::_attr(ctx)[1]),
            x3::_attr(ctx)[0]
        ); 
    };

    auto make_ql_date_time = [](auto& ctx) {
        using boost::fusion::at_c;
        x3::_val(ctx) = Date(
            at_c<0>(x3::_attr(ctx))[2],
            static_cast<Month>(at_c<0>(x3::_attr(ctx))[1]),
            at_c<0>(x3::_attr(ctx))[0],
            at_c<1>(x3::_attr(ctx))[0],
            at_c<1>(x3::_attr(ctx))[1],
            at_c<1>(x3::_attr(ctx))[2],
            at_c<1>(x3::_attr(ctx))[3]
        ); 
    };

    static const auto ql_date_def = 
        date_parser[make_ql_date]
        ;

    static const auto ql_date_time_def = 
        (
            date_parser > 
            as<std::vector<int>>(    
                (
                    x3::no_skip[x3::omit[x3::char_('T') | ' ' ]] >
                    time_parser
                )
                |
                (
                    x3::repeat(4)[x3::attr(int(0))]               
                )
            )
        )[make_ql_date_time] 
        ;

    BOOST_SPIRIT_DEFINE(
        ql_date,
        ql_date_time
    )

    auto try_parse = [](const std::string& date, const auto& p) 
        -> boost::optional<Date> 
    {
        auto ast = Date();
        auto first = date.begin();
        const auto last = date.end();
        boost::spirit::x3::ascii::space_type space;
        bool r = phrase_parse(first, last, p, space, ast);
        if (!r || first != last) {
            return boost::none;
        }
        else {
            return ast;
        }
    };

    auto parse = [](const std::string& date, const auto& p) {
        auto ast = Date();
        auto first = date.begin();
        const auto last = date.end();
        boost::spirit::x3::ascii::space_type space;
        bool r = phrase_parse(first, last, parsers::ql_date, space, ast);
        QL_REQUIRE(r && first == last,
            "Parsing of " << date << " failed at " << std::string(first, last));
        return ast;
    };
} // namespace parsers

} // namespace anonymous

Here are the free functions that use the parsers to form Date objects.


boost::optional<Date> DateTimeParser::maybeParseDate(const std::string& date) {
    return parsers::try_parse(date, parsers::ql_date);
}

Date DateTimeParser::parseDate(const std::string& date) {
    return parsers::parse(date, parsers::ql_date);
}

boost::optional<Date> 
DateTimeParser::maybeParseDateTime(const std::string& date) {
    return parsers::try_parse(date, parsers::ql_date_time);
}

Date DateTimeParser::parseDateTime(const std::string& date) {
    return parsers::parse(date, parsers::ql_date_time);
}

Lauri
  • 73
  • 8
  • Spirit will not beat hand-written/optimized parser, but in your case there are 1) `std::vector` fiddling (conversion, appending) 2) `x3::symbols` which is not a fastest thing. Also, as a bonus, be aware that `x3::char_('abcde')` will produce a slower parser than a hand written switch. – Nikita Kniazev Feb 03 '20 at 20:53
  • Lauri, could you please post a minimal code that can be compiled. The code will not compile because QL_REQUIRE is undefined and there is no main program and DateTimeParser is not defined. – user1681377 Feb 07 '20 at 21:50
  • The code at http://coliru.stacked-crooked.com/a/322dd5c5a6683117 compiles and run. I've heard semantic actions are slower than automatic attribute propagation. Would use of BOOST_FUSION_ADAPT_STRUCT avoid the copying from std::vector to Date? – user1681377 Feb 08 '20 at 02:08
  • The code uses as>(...) several places when it's unclear, at least to me, why it's needed. In addition, the code uses BOOST_SPIRIT_DEFINE when, AFAICT, since there's no recursion in the grammar, there's no need for that. Could you plase clarify, @Lauri, why there's so many as and any BOOST_SPIRIT_DEFINE? – user1681377 Feb 11 '20 at 12:32

1 Answers1

0

Lauri, please see if this date_parser gist runs fast enough. It avoids any unnecessary copying until the very last when the complicated attr2date code is used to copy from the parser attribute to the Date struct.

Some might think using BOOST_FUSION_ADAPT_STRUCT might avoid the need for the attr2date; however, that was attempted and resulted in compiletime error. This can be seen by switching the #define of USE_ALTERNATIVES here.

HTH.

-Larry

user1681377
  • 93
  • 1
  • 8
  • Although the code I provided works, it's way too complicated. Surely a more experienced spirit programmer ( @sehe ?) could provide a more simple solution? – user1681377 Jun 26 '20 at 13:28