0

Here is the code I used to test it. It works on ordinary directories, but not those mounted under sshfs. My goal is to use these methods in https://github.com/jlettvin/Greased-Grep which is designed to allow global fuzzy searches for keywords which must be present and keywords which must be absent.

#include <iostream>
#include <string>
#include <functional>
#include <dirent.h>

using std::cout;
using std::endl;
using std::string;
using std::function;

bool neither (const char* path)
{
        bool ret = (path != nullptr);
        if (ret)
        {
                if (path[0] == '.')
                {
                        if (path[1] == '\0') ret = false;
                        if (path[1] == '.' && path[2] == '\0') ret = false;
                }
        }
        return ret;
}

void walk (const string &path, function<void (const string &)> talk)
{
    if (auto dir = opendir (path.c_str ())) {
        while (auto f = readdir (dir)) {
                        auto name = f->d_name;
                        auto type = f->d_type;
                        if (neither (name))
                        {
                                switch (type)
                                {
                                        case DT_DIR: walk (path + name + "/", talk); break;
                                        case DT_REG: talk (path + name            ); break;
                                }
                        }
        }
        closedir(dir);
    }
}

int main (int argc, char** argv)
{
        walk ("./", [](const string &path) { cout << path << endl; });
        return 0;
}
jlettvin
  • 1,113
  • 7
  • 13
  • 1
    What "does not work" mean? – Sam Varshavchik Jan 02 '18 at 03:52
  • No files are reported from an sshfs directory but all files are reported from non-sshfs directories. – jlettvin Jan 02 '18 at 03:55
  • Possible duplicate of [Checking if a dir. entry returned by readdir is a directory, link or file. dent->d\_type isn't showing the type](//stackoverflow.com/q/23958040) Or at least highly related. Your super-simple answer could be optimized by only considering it as a possible directory for DT_DIR, DT_UNKNOWN, or symlinks. So you'd avoid wasting time trying to opendir on names where you did get a positive ID from the kernel of `DT_REG` or a device or fifo. – Peter Cordes Aug 03 '19 at 04:48

2 Answers2

4

You need to review the following documentation in the Linux's readdir(3) manual page:

  unsigned char       d_type;     /* Type of file; not supported
                                by all filesystem types */

Specifically, your attention is directed to the "not supported by all filesystem types" part.

Your code expects d_type to be set. However, readdir(3) does not guarantee that it will be.

One of the possible values for d_type is:

DT_UNKNOWN

    The file type could not be determined.

Code that should be prepared to handle all possibilities must explicitly check for DT_UNKNOWN, and, if so, append d_name to the directory name, stat() the filename, and then grab the file type from there.

d_type is a shortcut. If it's set, wonderful. You have it right off the bat. If not, you'll have to work to get it.

sshfs obviously does not support returning d_type from readdir(3). Hopefully, sshfs implements stat().

P.S., not that besides a directory and regular files, there are also several other special types that you may or may not have to handle (assuming that sshfs can even provide them to you). That's something you will need to investigate on your own.

Sam Varshavchik
  • 114,536
  • 5
  • 94
  • 148
  • sshfs now supports returning `d_type` correctly again: https://github.com/libfuse/libfuse/pull/591 Before, it always returned `d_type = DT_UNKNOWN`, requiring a `stat()` necessary, and many directory operations much slower. Of course user code should alway simplementing falling back to `stat()` et al when `DT_UNKNOWN` is encountered nevertheless. – nh2 Mar 30 '21 at 21:55
1

I have the code I wanted operational.

After massive experimentation with all manner of sensitivity to all sorts of flags and conditions and situations I have decided to go with utter simplicity.

I treat EVERYTHING found as BOTH a directory and file. If it is not a directory, I detect and ignore all errors and continue. If it is a directory, then I open it and search its contents for what I want. This is an advantage in Greased-Grep where the goal is to find things matching patterns. Filenames are things, just like their contents.

So, my answer is, I do not care about failures. I only care about successes, so I dismiss failures without any tests.

This works just fine when descending sshfs mounted directories.

Anyone interested in how this code looks can follow my github: https://github.com/jlettvin/Greased-Grep/blob/master/gg.cpp

Here is the salient code:

void walk (const string& a_path)
{
    // Don't attempt to assess validity of filenames... just fail.
    // Treat directories like files and search filenames in directories.
    // This enables gg to work on sshfs mounted filesystems.
    auto d{a_path};
    auto s{d.size ()};
    if (s && d[s - 1] == '/') d.resize (s-1);
    errno = 0;
    if (auto dir = opendir (d.c_str ()))
    {
        while (!errno)
        {
            if (auto f = readdir (dir))
            {
                if (auto p = f->d_name)
                {
                    if (auto q = p)
                    {
                        if (!(*q++ == '.' && (!*q || (*q++ == '.' && !*q))))
                        {
                            auto e = d + "/" + p;
                            walk (e);
                            mapped_search (e.c_str ());
                            errno = 0;
                        }
                    }
                    else break;
                }
                else break;
            }
            else break;
        }
    }
}
jlettvin
  • 1,113
  • 7
  • 13