3

My question centers around the for-loop in the listDirs function, where I am launching async tasks. I am passing path by reference to std::async which then invokes the listDir function in a separate thread.

I am aware that once the for-loop moves to the next iteration, the path variable, which is a const reference to a std::filesystem::path instance in the paths vector, goes out of scope. However, listDir function's parameter is a reference which should be bound to path.

My understanding is that even though path goes out of scope in the listDirs function, the actual std::filesystem::path instances in the paths vector persist for the entire duration of the listDirs function, as we're passing by std::ref. But I'm not certain if this understanding is correct.

Can someone please clarify how this works? Specifically:

Does std::ref in std::async ensure that listDir gets a valid reference even when path goes out of scope in the listDirs function? Is there any risk of a dangling reference in this scenario?

#include <filesystem>
using Iterator = std::filesystem::directory_iterator;
// The caller of this function is the thread runtime
std::vector<std::string> listDir(const std::filesystem::path& directory)
{
    
    
    std::vector<std::string> files;
    for (Iterator it(directory); it != Iterator(); ++it)
    {
        
        if (it->is_regular_file())
        {
            files.emplace_back(it->path().filename().string());
            
        }
        
    }
    // When we return this vector as the final action in the function, Return Value Optimization(RVO) takes place to
    // eliminate any extra copying of the vector
    return files;

}

std::vector<std::string> listDirs(const std::vector<std::filesystem::path>& paths)
{
    using Iterator = std::filesystem::directory_iterator;
    std::vector<std::future<std::vector<std::string>>> futures; // listDir returns std::vector<std::string> type
    // iterate over all the directory paths
    for (const std::filesystem::path& path : paths)
    {
    // start each thread using std::async
        futures.emplace_back(std::async(listDir, std::ref(path)));
    }
    std::vector<std::string> allFiles;
    for (std::future<std::vector<std::string>>& fut : futures)
    {

        std::vector<std::string> files = fut.get(); // RVO
        std::move(files.begin(), files.end(), std::back_inserter(allFiles));

    }
    // When we return this vector as the final action in the function, Return Value Optimization(RVO) takes place to
    // eliminate any extra copying of the vector
    return allFiles;
}
int main()
{
    std::filesystem::path currentPath("G:\\lesson4");
    std::vector<std::filesystem::path> paths;

    for (Iterator it(currentPath); it!= Iterator(); ++it)
    {
        if (it->is_directory())
        {
            std::cout << it->path() << '\n';
            paths.emplace_back(it->path());
        }
        
    }

    for (const auto& fileName : listDirs(paths))
    {
        std::cout << fileName << std::endl;
    }

}
Aamir
  • 1,974
  • 1
  • 14
  • 18
Sami
  • 513
  • 4
  • 11
  • 1
    `path` doesn't really go "out of scope", since it's a reference to an element of `paths`. And _that_ is in scope for the entire `listDirs` function. – paddy Jul 25 '23 at 23:49
  • doesn't it even go out of scope at the end of each for-each iteration because it is temp variable and created at each iteration? – Sami Jul 25 '23 at 23:54
  • 1
    Concur with @paddy there, `path` is a ref to something else that continues to exist. I'm not even sure you need `std::ref(path)` here when adding to `futures`, though I could be wrong. Access to the `path` "symbol" itself *may* disappear at times but the thing it refers is "alive" at all times you're using it. – paxdiablo Jul 25 '23 at 23:54
  • It's not a temporary variable. It's a _reference_. It refers literally to the thing that's inside your vector. If you pass that reference into something else, it's the _same_ reference. – paddy Jul 25 '23 at 23:58
  • 1
    By the way, @paddy, that should really be an *answer* rather than a comment. – paxdiablo Jul 25 '23 at 23:59
  • @paddy Thanks for the explanations. My another question is does 'path' go out of scope at the end of each iteration and re-created with the same name? – Sami Jul 26 '23 at 00:02
  • Uhh, I don't know how many times I need to say _it's a reference_. It refers to a single element of your vector at any single loop iteration, since this is a range-based loop. – paddy Jul 26 '23 at 00:05

1 Answers1

1

In your loop, the variable path is a reference. You can think of it a little like a pointer, except it's not.

for (const std::filesystem::path& path : paths)
{
    // start each thread using std::async
    futures.emplace_back(std::async(listDir, std::ref(path)));
}

At the first iteration of your loop, path refers to the first element of the vector paths. At the second iteration, it refers to the second element of the vector. And so on...

Because paths does not change for the lifetime of any reference into its elements (even those used in futures), this is safe. When you pass path into the std::async constructor with std::ref(path), that reference wrapper will encapsulate the current reference.

In fact, reference wrappers are typically implemented using a pointer under the hood, because that's the only practical way to pass around a reference as an lvalue.

Even if the loop moves to the second iteration before your first async method is called, the reference binding remains intact and still refers to the first element of paths.

paddy
  • 60,864
  • 6
  • 61
  • 103
  • Thanks for the explanations. It makes sense. why did you say 'You can think of it a little like a pointer, except it's not.' isn't a reference a const pointer? – Sami Jul 26 '23 at 00:26
  • 1
    No, a reference is a reference. Compilers can use pointers to implement references in some cases, but a reference is _not_ a pointer. To quote from the documentation linked to in my answer: a reference variable is _"an alias to an already-existing object or function"_. Also: _"References are not objects; they do not necessarily occupy storage"_. It's a subtle distinction, which is why I say you can think of them _like_ pointers, but it's important to avoid the trap of believing that they actually _are_ pointers... Because they are not. They are _references_. I hope I've now made this clear. – paddy Jul 26 '23 at 01:58
  • Yes. It is very clear. Thanks for your time and explanations. Appreciate it. – Sami Jul 26 '23 at 12:19