2

DO NOT TRY THIS AT HOME

I am having a weird issue with std::filesystem::remove_all. I have written a program that writes N files to disk in a single directory and then deletes all the files afterward (there is a good reason for this). However, when I use std::filesystem::remove_all I get errors like this:

filesystem error: cannot remove all: Structure needs cleaning [./tmp_storage] [./tmp_storage/2197772]

and the folder is not delete (obviously the call failed) and calling ls after shows that the file system is "damaged":

$ ls tmp_storage/
ls: cannot access 'tmp_storage/2197772': Structure needs cleaning
ls: cannot access 'tmp_storage/5493417': Structure needs cleaning
...

and I have to repair the file system. The fully program looks like this:

#include <fmt/core.h>
#include <CLI/CLI.hpp>

#include <filesystem>
#include <fstream>
#include <string>
#include <exception>

int main(int argc, char** argv)
{
  size_t num_files{64000000};

  CLI::App app("Writes N number of files to dir in file system to check the maximum number of files in a directory");
  app.add_option("-c,--count", num_files, fmt::format("How many files generate [Default: {}]", num_files));
  CLI11_PARSE(app, argc, argv);

  std::string base_path = "./tmp_storage";

  if (!std::filesystem::exists(base_path))
  {
    std::filesystem::create_directory(base_path); 
  }

  size_t i;

  for (i = 1; i <= num_files; ++i)
  {
    std::string file_path = fmt::format("{}/{}", base_path, std::to_string(i));
    std::ofstream out(file_path, std::ios::binary);

    if (out.fail())
    {
      break; 
    }

    try
    {
      out << std::to_string(i); 
    }
    catch(const std::exception& e)
    {
      fmt::print("{}\n", e.what());
    }
  }

  fmt::print("Wrote {} out of {} files\n", i, num_files);

  try
  {
    std::filesystem::remove_all(base_path);
  }
  catch(const std::exception& e)
  {
    fmt::print("{}\n", e.what());
  }
  
  fmt::print("Done\n");
  
  return 0; 
}

Compiled with the following Makefile:

CC = clang++
CXX_FLAGS = -std=c++17
LINK_FLAGS = -lfmt

all:
    $(CC) $(CXX_FLAGS) main.cpp -o main $(LINK_FLAGS)

I have been able to replicate the behavior on Fedora Server 33/34 and Ubuntu with Fedora using XFS and Ubuntu using EXT4 and XFS. Is this a bug in std::filesystem::remov_all or am I doing something wrong?

For Fedora the kernel version is: Linux 5.12.12-300.fc34.x86_64 x86_64 with clang version

clang version 12.0.0 (Fedora 12.0.0-2.fc34)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Lars Nielsen
  • 2,005
  • 2
  • 25
  • 48
  • 2
    Were both of your attempts on the same physical machine/hard drive (so either dual booting or VMs on the same host)? – Joseph Sible-Reinstate Monica Jul 01 '21 at 05:58
  • Unrelated suggestion: Do `std::filesystem::path base_path = "./tmp_storage";` and then `auto file_path = base_path / std::to_string(i);` – Ted Lyngmo Jul 01 '21 at 06:06
  • @JosephSible-ReinstateMonica 3 different physical machines – Lars Nielsen Jul 01 '21 at 06:15
  • 1
    I read this [Cannot remove file: “Structure needs cleaning”](https://unix.stackexchange.com/a/330767/391809) and aborted my run of your program. I do _not_ want to replicate this error. I'm running Fedora Server 34 too. – Ted Lyngmo Jul 01 '21 at 06:18
  • 8
    Even if the C++ library has a bug, generally *userspace code shouldn't be able to corrupt filesystems!* Sounds more like an issue in the code of the OS/filesystem itself... perhaps predictable given that the purpose of this program seems to be stress testing precisely that. – HTNW Jul 01 '21 at 06:19
  • @TedLyngmo yeah I know that syntax but I don't like it XD and thanks... and yes I did not expect anyone to like destroy their file system to test XD – Lars Nielsen Jul 01 '21 at 06:20
  • 1
    @LarsNielsen I sometimes leap before I look :-) – Ted Lyngmo Jul 01 '21 at 06:20
  • @HTNW kind of it is supposed to reveal how many files can be in a directory of the given file system, since for instance it varies for EXT4 based on how some settings in the OS is – Lars Nielsen Jul 01 '21 at 06:20
  • @TedLyngmo interesting with the link you shared, the weird thing though is that `rm` seems to work – Lars Nielsen Jul 01 '21 at 06:23
  • 1
    @LarsNielsen That's good to hear - my `rm -rf tmp_storage` is still running :-) – Ted Lyngmo Jul 01 '21 at 06:23
  • 1
    Maybe, edit a warning in: "Dear children. Please, don't try this at home." ;-) – Scheff's Cat Jul 01 '21 at 06:25
  • I can imagine :D @TedLyngmo – Lars Nielsen Jul 01 '21 at 06:25
  • 2
    @Scheff'sCat done :) – Lars Nielsen Jul 01 '21 at 06:26
  • 2
    Definitely an OS bug, that bug might be triggered by a bug/unusual behaviour in std:: filesystem but user code shouldn't be able to corrupt the filesystem – Alan Birtles Jul 01 '21 at 06:32
  • 1
    @AlanBirtles do you have a guess as to whether the bug would be in the VFS or lower? My thinking is that it would be VFS since the error is across both EXT4 and XFS – Lars Nielsen Jul 01 '21 at 06:40
  • No idea I'm afraid, I'm not familiar with the internals of linux file systems – Alan Birtles Jul 01 '21 at 07:02
  • @AlanBirtles fair enough :) – Lars Nielsen Jul 01 '21 at 07:02
  • If it's an OS bug then `std::filesystem::remove_all()` hardly can blamed for this. @HTNW _generally userspace code shouldn't be able to corrupt filesystems!_ Hmm... Does it mean the execution of `std::system("rm -rf /");` should be prevented as well? (Though, bad example - this might be done by intention although I cannot imagine with which.) – Scheff's Cat Jul 01 '21 at 07:19
  • Please kindly add kernel versions and glibc versions and clang versions to OS specifications. Do you see errors in dmesg? – KamilCuk Jul 01 '21 at 07:19
  • @Scheff'sCat I really don't hope it is OS either as `rm -r` works – Lars Nielsen Jul 01 '21 at 07:21
  • @KamilCuk I will add the kernel version and clang version yes :) – Lars Nielsen Jul 01 '21 at 07:21
  • 1
    Yes I clang all day every day @KamilCuk :) – Lars Nielsen Jul 01 '21 at 07:22
  • 1
    My quick search results indicate that this is a common problem of ext4 and xfs, and not a rare one. It has nothing to do with std::filesystem or c++ specifically. – n. m. could be an AI Jul 01 '21 at 07:33
  • @n.1.8e9-where's-my-sharem. I know it is a problem that can happen. But! it is weird that it happens only with `std::filesystem::remove_all` and not `rm -r` which is what let me to believe it was an issue with the former. – Lars Nielsen Jul 01 '21 at 07:42
  • It is an unstable condition that in your case is triggered with remove_all, bun other people manage to trigger it with other things, including shell scripts. – n. m. could be an AI Jul 01 '21 at 07:54
  • Hopefully `rm - r` is a rather usual command and implementors have managed for it not to trigger a file system bug! On the other hand, `std::filesystem` is only a C++17 addition and has not yet been as extensively tested. As is seems to be reproducible, it deserves IMHO a ticket for the C++ library. Whether it will end in identifying (and later fixing) a bug in extfs4 implementation is a different question (still IMHO). – Serge Ballesta Jul 01 '21 at 07:57
  • 1
    No, it is a problem with the specific Linux filesystems. This condition should be impossible to trigger from user-level code. It is like you discover that you can edit a particular file you should not have access to on your system with a particular editor, and open a ticket against the editor. – n. m. could be an AI Jul 01 '21 at 08:22
  • @n.1.8e9-where's-my-sharem. damn :( okay thanks :) – Lars Nielsen Jul 01 '21 at 08:38
  • 1
    its probably that `remove_all` deletes the files in a different order to `rm -r` which triggers the bug – Alan Birtles Jul 01 '21 at 08:47
  • ... great or something like that @AlanBirtles I am guessing something like `for(const auto& file : std::filesystem::directory_iterator()) {std::filesystem::remove(file);}` would be safer then ? – Lars Nielsen Jul 01 '21 at 08:59
  • 2
    With some error checking and recursing into sub directories that's pretty much what `remove_all` does anyway: https://github.com/gcc-mirror/gcc/blob/16e2427f50c208dfe07d07f18009969502c25dc8/libstdc%2B%2B-v3/src/filesystem/ops.cc#L1095 (I'm assuming you are using libstdc++) – Alan Birtles Jul 01 '21 at 09:07

2 Answers2

1

I tried to reproduce this on Fedora 34 using this modified program (removing the fmt and cli11 dependencies):

#include <filesystem>
#include <fstream>
#include <string>
#include <exception>

int main(int argc, char** argv)
{
  size_t num_files{64000000};

  if (argc > 1)
    num_files = std::stol(argv[1]);

  std::string base_path = "./tmp_storage";

  try
  {
    if (!std::filesystem::exists(base_path))
    {
      std::filesystem::create_directory(base_path); 
    }

    size_t i;

    for (i = 1; i <= num_files; ++i)
    {
      auto si = std::to_string(i);
      std::string file_path = base_path + '/' + si;
      std::ofstream out(file_path, std::ios::binary);

      if (out.fail())
        throw std::system_error(errno, std::generic_category(), "ofstream failed: " + file_path);

      try
      {
        out << si;
      }
      catch(const std::exception& e)
      {
        std::puts(e.what());
      }
    }

    std::printf("Wrote %zu out of %zu files\n", i - 1, num_files);

    std::filesystem::remove_all(base_path);
  }
  catch(const std::exception& e)
  {
    std::puts(e.what());
  }
  
  std::puts("Done");
  
  return 0; 
}

I can't reproduce the errors in F34, using ext4 or xfs or with the default installation choice of btrfs. I also can't reproduce it on another server using xfs, with clang 13.0.0 and libstdc++-11.2.1 and kernel 5.14.0. This means I'm unable to debug where my std::filesystem implementation corrupts the filesystem, and unable to report it to the kernel team.

I'm not sure whether the code is encountering a kernel bug or if you have faulty hardware. Did you check what the system journal said around the time of the filesystem corruption? Where there any errors from the kernel?

Edit: Also, are you using LVM for your disks? I think all my tests were without LVM.

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
0

NOTE: This is not a solution to underlying and operating system problems, but a way to avoid it in C++.

The change we need to make to the original code is "minimal". All changes is made to the try block

 try
  {
    std::filesystem::remove_all(base_path);
  }
  catch(const std::exception& e)
  {
    fmt::print("{}\n", e.what());
  }

and replace: std::filesystem::remove_all(base_path); with sequential deletes.

for (auto& path : std::filesystem::directory_iterator(base_path))
{
    std::filesystem::remove(path);
}

Changing the original code to

#include <fmt/core.h>
#include <CLI/CLI.hpp>

#include <filesystem>
#include <fstream>
#include <string>
#include <exception>

int main(int argc, char** argv)
{
    size_t num_files{64000000};
    
    CLI::App app("Writes N number of files to dir in file system to check the maximum number of files in a directory");
    app.add_option("-c,--count", num_files, fmt::format("How many files generate [Default: {}]", num_files));
    CLI11_PARSE(app, argc, argv);

    std::string base_path = "./tmp_storage";

    if (!std::filesystem::exists(base_path))
    {
        std::filesystem::create_directory(base_path); 
    }

    size_t i;

    for (i = 1; i <= num_files; ++i)
    {
        std::string file_path = fmt::format("{}/{}", base_path, std::to_string(i));
        std::ofstream out(file_path, std::ios::binary);

        if (out.fail())
        {
            break; 
        }

        try
        {
            out << std::to_string(i); 
        }
        catch(const std::exception& e)
        {
            fmt::print("{}\n", e.what());
        }
    }

    fmt::print("Wrote {} out of {} files\n", i, num_files);

    try
    {
        for (auto& path : std::filesystem::directory_iterator(base_path))
        {
            std::filesystem::remove(path); 
        }
    }
    catch(const std::exception& e)
    {
        fmt::print("{}\n", e.what());
    }
  
    fmt::print("Done\n");
  
    return 0; 
}
Lars Nielsen
  • 2,005
  • 2
  • 25
  • 48
  • This is not a correct answer, most probably you have a faulty RAM/SSD. – NoSenseEtAl Feb 22 '22 at 22:03
  • 1
    @NoSenseEtAl I can reproduce it across mutliple machines and operating systems so unless +10 machines has faulty ram, SSD and HDD, then I doubt that is the case. – Lars Nielsen Feb 24 '22 at 13:50
  • You could offer remote access to author of another answer, or if he is not interested at least provide details of those machines. Still this seems very fishy to me. – NoSenseEtAl Feb 25 '22 at 11:00
  • 2
    @NoSenseEtAl I am currently in the process of investigating this in more detail. To identify what is actually breaking. – Lars Nielsen Feb 27 '22 at 09:28