-2

With the future C++, is there a better way to ignore files with other than wanted extensions than the one shown in the code snippet below?

I am learning the C++ experimental <filesystem> (http://en.cppreference.com/w/cpp/experimental/fs) while writing a simple program that transforms text files from one directory to text file in another directory. The program takes input and output directories via command-line arguments. Only the files with certain extensions (like .csv, .txt, ...) should be processed. The output files should have the .xxx extension.

#include <filesystem>
namespace fs = std::tr2::sys; // the implementation from Visual Studio 2015

    ...
    fs::path srcpath{ argv[1] };
    fs::path destpath{ argv[2] };
    ... 
    for (auto name : fs::directory_iterator(srcpath))
    {
        if (!fs::is_regular_file(name))
            continue;                  // ignore the non-files

        fs::path fnameIn{ name };      // input file name

        // Ignore unwanted extensions (here lowered because of Windows).
        string ext{ lower(fnameIn.extension().string()) };
        if (ext != ".txt" && ext != ".csv")
            continue;

        // Build the output filename path.
        fs::path fnameOut{ destpath / fnameIn.filename().replace_extension(".xxx") };

        ... processing ...
    }
pepr
  • 20,112
  • 15
  • 76
  • 139
  • 3
    is not in C++17. It's in the Filesystem TS. – Kerrek SB Jan 29 '16 at 14:26
  • 2
    Looks pretty good to me. – Lightness Races in Orbit Jan 29 '16 at 14:26
  • 1
    This seems like the way one would do it. You could do it (arguably) a bit nicer with something like [boost::filter_iterator](http://www.boost.org/doc/libs/release/libs/iterator/doc/filter_iterator.html). I would like to see someone implement [glob](http://man7.org/linux/man-pages/man3/glob.3.html) using the standard library. – eerorika Jan 29 '16 at 14:28
  • There is no "experimental C++17". C++17 is a placeholder name for what is expected to be a revised standard, to be published in 2017. Names in the namespace `experimental` are used in Technical Specifications, which are independent of the actual standard. – Pete Becker Jan 29 '16 at 14:34
  • @PeteBecker and @KerrekSB: I forgot to put `` to backticks so it did not appear in the question. It is also clear that C++17 does not exist yet and that everything around is rather _experimental_. I hope it is more understandable now -- no need to be extremely formal. :) – pepr Jan 29 '16 at 14:46
  • Say I has 5 extensions. How would you test against the set of the extensions? Brevity/readability counts. – pepr Jan 29 '16 at 14:51
  • @pepr - no, there is no need to be extremely formal. There is, however, a need to be correct. Engineering isn't like horseshoes: close doesn't count. – Pete Becker Jan 29 '16 at 15:08
  • 1
    @PeteBecker: You are right. Fixed in the question. (Anyway, having formal CS education, I hope I know something about engineering. But still, you are right. :) – pepr Jan 29 '16 at 15:24
  • @pepr: Don't edit answers into your questions. – Nicol Bolas Jan 29 '16 at 16:37
  • @NicolBolas: Should I make an answer for that? – pepr Jan 29 '16 at 20:42
  • @pepr: What you added in was essentially my answer, which you already accepted. You've already done what you needed to do. If you feel that your restatement of that answer is better than mine, go ahead and post it as an answer. – Nicol Bolas Jan 29 '16 at 21:10
  • @NicolBolas: No. I put together everything for those who may want to solve the same problem. I have used partly your solution, some things from the Praetorian's solution. Now placed here http://stackoverflow.com/a/35093810/1346705. Possibly, the question should went to Code Review (http://codereview.stackexchange.com/). – pepr Jan 30 '16 at 00:29

3 Answers3

1

Basically, your question boils down to, "given a string, how do I determine if it matches one of a number of possibilities?" That's pretty trivial: put the possibilities in a std::set:

//Before loop
std::set<std::string> wanted_exts = {".txt", ".csv"};

//In loop
string ext{ lower(fnameIn.extension().string()) };
if (wanted_exts.find(ext) == wanted_exts.end())
    continue;

You can of course keep wanted_exts around for as long as you like, since it probably won't change. Also, if you have Boost.Containers, I would suggest making wanted_exts a flat_set. That will help minimize allocations.

Nicol Bolas
  • 449,505
  • 63
  • 781
  • 982
  • The set of strings, indeed! – pepr Jan 29 '16 at 15:39
  • I have accepted this one as this was the core of the question. The optimization question would be whether the `set` is the most suitable container for the handful of extensions. _"When in doubt, measure."_ Anyway, being a game developer and having possibly a feeling for memory consumption of the set of string vs. another container, would you still choose the set of strings? Thanks ;) – pepr Jan 29 '16 at 15:57
  • For the `flat_set`, is it in (proposed) standard? While I like Boost, I would like to avoid it for small things. The reason is that one has to install/compile a huge thing. – pepr Jan 29 '16 at 20:47
1

std::tr2::sys was the namespace MSVC used in VS2013 to ship the filesystem TS, but that is actually supposed to be in the std::experimental::v1 namespace; the old namespace has been retained for backwards compatibility. v1 is an inline namespace, so you can drop that from the name and say

namespace fs = std::experimental::filesystem;

Assuming using boost is an option, you can perform filtering of the directory entries using Boost.Range adaptors. And testing for any one of several extensions can be done using boost::algorithm::any_of_equal.

#include <boost/algorithm/cxx11/any_of.hpp>
#include <boost/range/adaptors.hpp>

for(auto const& p : 
      boost::make_iterator_range(fs::directory_iterator(srcpath), {})
      | boost::adaptors::transformed([](auto const& d) { 
          return fs::path(d); })
      | boost::adaptors::filtered([](auto const& p) { 
          return fs::is_regular_file(p); })
      | boost::adaptors::filtered([](auto const& p) { 
          auto const& exts = { ".txt", ".csv" };
          return boost::algorithm::any_of_equal(exts, p.extension().string()); })
   ) {
    // all filenames here will have one of the extensions you tested for
}
Praetorian
  • 106,671
  • 19
  • 240
  • 328
  • Thanks for the boost example (+1), and for the nicer namespace name that I did not know about. The code is interesting. I have to wrap my head around. The question is whether it is as readable as a non-lambda code (at least for me). – pepr Jan 29 '16 at 15:38
0

The solution of the loop that I have finally chosen...

#include <filesystem>
namespace fs = std::experimental::filesystem;

...

set<string> extensions{ ".txt", ".csv" };

for (auto const& name : fs::directory_iterator(srcpath))
{
    if (!fs::is_regular_file(name))
        continue;

    fs::path fnameIn{ name };
    string ext{ lower(fnameIn.extension().string()) };
    if (extensions.find(ext) != extensions.end())
    {
        fs::path fnameOut{ destpath / fnameIn.filename().replace_extension(".xxx") };
        processing(fnameIn, fnameOut);
    }
}
pepr
  • 20,112
  • 15
  • 76
  • 139
  • `extensions.find(ext) != extensions.end()` is only true if the extension were *found*. Which means it's in the set. And so you'll process it. Which is the *opposite* of ignoring it. – Nicol Bolas Jan 30 '16 at 01:03
  • Yes. This is the wanted behaviour. The earlier approach used the reversed condition plus skiping via `continue`. Actually, the title is misleading. I am going to fix it. – pepr Jan 31 '16 at 07:18