7

In the C++17 filesystem library, we got std::filesystem::remove(path), which — as I understand it — is a direct port of boost::filesystem::remove(path) from Boost.Filesystem.

But C++ inherited from C89 a very similar function called std::remove(path), which is also documented as a way to remove a file from the filesystem. I'm vaguely aware of some pitfalls with this function, e.g. I believe I have heard that on Windows std::remove cannot be used to remove a file that is still being held open by the current process.

Does std::filesystem::remove fix these issues with std::remove? Should I prefer std::filesystem::remove over std::remove? Or is the former just a namespaced synonym for the latter, with the same warts and pitfalls?

The title of my question asks for the difference between boost::filesystem::remove(path) and std::remove(path) because I figure that std::filesystem::remove(path) may not have been implemented by a lot of library vendors yet, but my understanding is that it's supposed to be basically a direct copy of the Boost version. So if you know about Boost.Filesystem on Windows, you probably know enough to answer this question too.

Quuxplusone
  • 23,928
  • 8
  • 94
  • 159
  • 3
    It takes a minute of inspecting the source of `boost::filesystem::remove` to know that it [simply calls `DeleteFileW`](https://github.com/boostorg/filesystem/blob/07619fb37007f45b54bc71877e724c8f4b014c9f/src/operations.cpp#L240) on Windows. – T.C. Sep 06 '17 at 03:30
  • 1
    The `filesystem` functions can handle general Unicode paths, when you exercise some care. In particular, with current implementations, don't rely on default conversion from UTF-8, but do that explicitly. The old `std::remove` is limited to the narrow execution character set, and I sincerely doubt that any Windows implementation detects that that execution character set is UTF-8 (since even `filesystem` implementations fail to detect that). So in Windows it can only handle paths with characters from Windows ANSI, which is system-specific encoding. – Cheers and hth. - Alf Sep 06 '17 at 03:40
  • @Cheersandhth.-Alf: How would a Windows C++ implementation detect that "the execution character set" is UTF-8? Either that implementation defines its own execution sets (no detection needed) or it defers to Windows (which doesn't allow UTF-8 as the default character set). – MSalters Sep 06 '17 at 08:51
  • @MSalters: UTF-8 produces very distinctive data. Checking the bytes of `"pøh"` should be enough. So it's trivial for the case of inline code. – Cheers and hth. - Alf Sep 06 '17 at 11:27
  • @MSalters: "Windows (which doesn't allow UTF-8 as the default character set)" is wrong. Windows doesn't support an UTF-8 locale. But both main compilers for Windows, g++ and Visual C++, support UTF-8 as the execution character set, and with g++ it's the default. I guess that the plug-in replacements for those compilers, namely clang and Intel, behave the same. – Cheers and hth. - Alf Sep 06 '17 at 11:33
  • @Cheersandhth.-Alf: g++ is what I meant by "no need to detect it", as it's the default. As for Visual Studio, [MSDN](https://msdn.microsoft.com/en-us/library/09k5ez9h.aspx) enumerates the extra characters above the 96 in the source character set, and that's just a handful - not UTF-8. – MSalters Sep 06 '17 at 12:37
  • @MSalters: It may be that you're not familiar with terminology (in addition to getting everything else wrong). Execution character set != basic source character set. g++ can use any execution character set, not just the default UTF-8. Hence there's need for detection for g++. Visual C++ supports UTF-8 as execution character set. E.g. its `/utf8` option, and its more specific source encoding and execution character set options. – Cheers and hth. - Alf Sep 06 '17 at 13:33
  • @Cheersandhth.-Alf: The MSDN page I linked appears to use the term _execution character set_ as intended in chapter 2 of the Standard, even highlighting that Microsoft's definition is not portable to other compilers. That indeed matches the Standard which describes it as "implementation defined", and MSDN is Microsoft's "implementation definition". – MSalters Sep 06 '17 at 13:42
  • Which is the VS2015 page BTW - I now see that VS-2017 auto-detects its [source charset](https://learn.microsoft.com/en-us/cpp/build/reference/source-charset-set-source-character-set) and indeed supports utf-8. – MSalters Sep 06 '17 at 13:49
  • @MSalters: I'm sorry that I didn't look at it. For resolving our different viewpoints the facts I've referred to are more than enough, but I should have considered the broader issue of resolving general misunderstandings etc. Mea culpa, I apologize. So, the page you looked at lists a few characters that the MSVC execution character set is guaranteed to have, regardless of what it is. That page is not exactly informative. Look at the docs of [the MSVC execution character set option](https://docs.microsoft.com/en-us/cpp/build/reference/execution-charset-set-execution-character-set) instead. – Cheers and hth. - Alf Sep 06 '17 at 13:53
  • @MSalters: MSVC has always detected source character set. The new thing with MSVC 2015 (I believe it was) was that you could override that detection by *specifying* the source encoding with an option. That's about the same model as g++, and with g++ it fails completely in certain situations with headers of different encodings, so IMHO it's really ungood. I do not know (but will probably have to find out, sooner or later) whether MSVC's option is also transitive, i.e. applying to included headers; I'd hope not, but I don't currently know. – Cheers and hth. - Alf Sep 06 '17 at 13:57

1 Answers1

5

Checking the standard library sources installed with my MSVC, std::experimental::filesystem::remove calls its internal _Unlink helper, which simply calls _wremove, which simply calls Windows DeleteFileW. Similarly, boost::filesystem::remove also just calls DeleteFileW on Windows.


std::filesystem::remove is specified by reference to POSIX remove, but the global wording in [fs.conform.9945] makes clear that implementations are not required to provide the exact POSIX behavior:

Implementations should provide such behavior as it is defined by POSIX. Implementations shall document any behavior that differs from the behavior defined by POSIX. Implementations that do not support exact POSIX behavior should provide behavior as close to POSIX behavior as is reasonable given the limitations of actual operating systems and file systems. If an implementation cannot provide any reasonable behavior, the implementation shall report an error as specified in [fs.err.report]. [ Note: [...] ]

Implementations are not required to provide behavior that is not supported by a particular file system. [ Example: [...] ]

Any quirks in ::remove (that is about the actual act of removing rather than identification of the file to be removed) are likely due to limitations of the underlying OS API. I see no reason to think that an implementation of std::filesystem::remove on the same operating system will magically do better.

T.C.
  • 133,968
  • 17
  • 288
  • 421
  • 1
    Re "I see no reason to think that an implementation of std::filesystem::remove on the same operating system will magically do better.", well, as I've already mentioned in comment on the question, `std::filesystem::remove` can handle general Unicode paths. That's a pretty big difference in functionality. Which means that `std::filesystem::remove` does better, far better. – Cheers and hth. - Alf Sep 06 '17 at 11:38
  • @Cheersandhth.-Alf: Checking my understanding— is your difference due to the fact that `std::remove` takes a `const char*` but `fs::remove` takes a `fs::path` which is [basically a `basic_string`](http://en.cppreference.com/w/cpp/filesystem/path) on Windows? – Quuxplusone Sep 06 '17 at 17:40
  • 1
    @Quuxplusone: Yes. – Cheers and hth. - Alf Sep 06 '17 at 19:40