I've got stuck trying to create a Boost.Spirit parser for the callgrind tool's output which is part of valgrind. Callgrind outputs a domain specific embedded programming language (DSEL) which lets you do all sorts of cool stuff like custom expressions for synthetic counters, but it's not easy to parse.
I've placed some sample callgrind output at https://gist.github.com/ned14/5452719#file-sample-callgrind-output. I've placed my current best attempt at a Boost.Spirit lexer and parser at https://gist.github.com/ned14/5452719#file-callgrindparser-hpp and https://gist.github.com/ned14/5452719#file-callgrindparser-cxx. The Lexer part is straightforward: it tokenises tag-values, non-whitespace text, comments, end of lines, integers, hexadecimals, floats and operators (ignore the punctuators in the sample code, they're unused). White space is skipped.
So far so good. The problem is parsing the tokenised input stream. I haven't even attempted the main stanzas yet, I'm still trying to parse the tag-values which can occur at any point in the file. Tag values look like this:
tagtext: unknown series of tokens<eol>
It could be freeform text e.g.
desc: I1 cache: 32768 B, 64 B, 8-way associative, 157 picosec hit latency
In this situation you'd want to convert the set of tokens to a string i.e. to an iterator_range and extract.
It could however be an expression e.g.
event: EPpsec = 316 Ir + 1120 I1mr + 1120 D1mr + 1120 D1mw + 1362 ILmr + 1362 DLmr + 1362 DLmw
This says that from now on, event EPpsec is to be synthesised as Ir multiplied by 316 added to I1mr multiplied by 1120 added to ... etc.
The point I want to make here is that tag-value pairs need to be accumulated as arbitrary sets of tokens, and post-processed into whatever we turn them into later.
To that end, Boost.Spirit's utree() class looked exactly what I wanted, and that's what the sample code uses. But on VS2012 using the November CTP compiler with variadic templates I'm currently seeing this compile error:
1>C:\Users\ndouglas.RIMNET\documents\visual studio 2012\Projects\CallgrindParser\boost\boost/range/iterator_range_core.hpp(56): error C2440: 'static_cast' : cannot convert from 'boost::spirit::detail::list::node_iterator<const boost::spirit::utree>' to 'base_iterator_type'
1> No constructor could take the source type, or constructor overload resolution was ambiguous
1> C:\Users\ndouglas.RIMNET\documents\visual studio 2012\Projects\CallgrindParser\boost\boost/range/iterator_range_core.hpp(186) : see reference to function template instantiation 'IteratorT boost::iterator_range_detail::iterator_range_impl<IteratorT>::adl_begin<const Range>(ForwardRange &)' being compiled
1> with
1> [
1> IteratorT=base_iterator_type
1> , Range=boost::spirit::utree
1> , ForwardRange=boost::spirit::utree
1> ]
... which suggests that my base_iterator_type, which is a Boost.Spirit multi_pass<> wrap of an istreambuf_iterator for forward iterator nature, is somehow not understood by Boost.Spirit's utree() implementation. Thing is, I'm not sure if this is my bad code or bad Boost.Spirit code seeing as line_pos_iterator<> was failing to correctly specify its forward_iterator concept tag.
Thanks to past Stackoverflow help I could write a pure non-tokenised grammar, but it would be brittle. The right solution is to tokenise and use a freeform grammar capable of fairly arbitrary input. The number of examples of getting Boost.Spirit's Lex and Grammar working together in real world examples to achieve this rather than toy examples is sadly very few. Therefore any help would be greatly appreciated.
Niall