First off, you have an extra space character in the regex.
But the real problem is you're treating the whole input as a single line. If you set that flag: 
you will find that regex101 shows the same results.
In regex, all open quantifiers are greedy by default. As such, you must be a lot more specific. At the very start you have
#include.+
This is already the end of it, since .+
simply matches all of the content (up to and including the last line). Your only reprieve is that backtracking will occur so that at least 1 "tail" of the regex matches, but all the rest is "souped up" in between. Because .+
literally asks for 1 or as many as possible
of any character
!
Attempted Fixes...
- make
.+
be \s+
or so. In fact, it needs to be \s*
because #include<iostream>
is perfectly valid C++
next, you cannot match like you did because you'd happily match #include <iostream"
or #include "iostream>
. And again, .*
needs to be limited. In this case, you can make the closing delimiter completely deterministic (because the opening delimiter completely predicts it), so you can use non-greedy Kleene-star:
#include\s*("(.*?)"|<(.*?)>)
HOWEVER
The real problem is that you're trying to parse a full on grammar with ... regexen¹.
All I can say is
Could you not?!
Here's a suggestion using Boost Spirit:
auto comment_ = space
| "//" >> *(char_ - eol)
| "/*" >> *(char_ - "*/")
;
Woah. That's a breath of fresh air. It's almost like programming, instead of wizardry and crossing your thumbs!
Now for the real meat:
auto include_ = "#include" >> (
'<' >> *~char_('>') >> '>'
| '"' >> *~char_('"') >> '"'
);
And of course you want have the proof of the pudding too:
std::string header;
bool ok = phrase_parse(content.begin(), content.end(), seek[include_], comment_, header);
std::cout << "matched: " << std::boolalpha << ok << ": " << header << "\n";
This parses a single header and prints: Live On Coliru
matched: true: iostream
Would it be a piece of cake to scale up to all the non-commented includes?
std::vector<std::string> headers;
bool ok = phrase_parse(content.begin(), content.end(), *seek[include_], comment_, headers);
Oops. Two bugs. Firstly, we should not be matching our grammar. The best way would be to ensure we are at start of line, but that complicates the grammar. For now, let's disallow names spanning multiple lines:
auto name_ = rule<struct _, std::string> {} = lexeme[
'<' >> *(char_ - '>' - eol) >> '>'
| '"' >> *(char_ - '"' - eol) >> '"'
];
auto include_ = "#include" >> name_;
That helps a bit. The other bug is actually tougher, and I think it's a library bug. The problem is that it sees all the includes as active? It turns out that seek
does not correctly use the skipper after the first match.² For now, let's work around it:
bool ok = phrase_parse(content.begin(), content.end(), *(omit[*(char_ - include_)] >> include_) , comment_, headers);
It does take away from the elegance a bit, but it does work:
The Full Monty
The full demo Live On Coliru
// #include <boost/graph/adjacency_list.hpp>
#include "iostream"
#include<fstream> /*
#include <boost/filesystem.hpp>
#include <boost/regex.hpp> */ //
#include <boost/spirit/home/x3.hpp>
void filename(std::string const& fname) //function takes directory path
{
using namespace boost::spirit::x3;
auto comment_ = space
| "//" >> *(char_ - eol)
| "/*" >> *(char_ - "*/")
;
auto name_ = rule<struct _, std::string> {} = lexeme[
'<' >> *(char_ - '>' - eol) >> '>'
| '"' >> *(char_ - '"' - eol) >> '"'
];
auto include_ = "#include" >> name_;
auto const content = [&]() -> std::string {
std::ifstream file(fname);
return { std::istreambuf_iterator<char>{file}, {} };//string to be parsed
}();
std::vector<std::string> headers;
/*bool ok = */phrase_parse(content.begin(), content.end(), *(omit[*(char_ - include_)] >> include_) , comment_, headers);
std::cout << "matched: " << headers.size() << " active includes:\n";
for (auto& header : headers)
std::cout << " - " << header << "\n";
}
int main() {
filename("main.cpp");
}
Printing
matched: 3 active includes:
- iostream
- fstream
- boost/spirit/home/x3.hpp
¹ And it's not in Perl6, in which case you could be forgiven.
² I'll try to fix/report this tomorrow