0

I was looking for a way to loop into all the files and folders in a given path and I stumbled into this: get tree structure of a directory with its subfolders and files using C#.net in windows application

I was fascinated by Xiaoy312 repost. So I took their code and modified it to serve my intended purpose, which is returning a list of all files' paths in a given path:

using System;
using System.Collections.Generic;
using System.IO;

class Whatever
{
    static List<string> filePaths = new List<string>();
    static void Main()
    {
        string path = "some folder path";
        DirectoryInfo directoryInfo = new DirectoryInfo(path);
        IEnumerable<HierarchicalItem> items = SearchDirectory(directoryInfo, 0);
        foreach (var item in items) { } // my query is about this line.
        PrintList(filePaths); 
        Console.Read();
    }
    static void PrintList(List<string> list)
    {
        foreach(string path in list)
        {
            Console.WriteLine(path);
        }
    }
    public static IEnumerable<HierarchicalItem> SearchDirectory(DirectoryInfo directory, int deep = 0)
    {
        yield return new HierarchicalItem(directory.Name, deep);
        foreach (DirectoryInfo subdirectory in directory.GetDirectories())
        {
            foreach (HierarchicalItem item in SearchDirectory(subdirectory, deep + 1))
            {
                yield return item;
            }
        }
        foreach (var file in directory.GetFiles())
        {
            filePaths.Add(file.FullName);
            yield return new HierarchicalItem(file.Name + file.Extension, deep + 1);
        }
    }
}

Now I know the general theme of recursiveness and how the function calls itself, etc. But while I was testing the code by trail an error, I noticed that it doesn't matter whether that last foreach in the "Main" method is empty or not, also, when that foreach is removed, filePaths are not filled anymore.

My Questions:

  1. So why that last foreach in "Main" method fills the list even if it is empty? And why when it is removed, filling the list fails?
  2. Can someone mention the steps of the recursiveness cycle, such as
  1. SearchDirectory called,
  2. the Empty foreach iterates the first item,
  3. SearchDirectory returns new HierarchicalItem of the path folder.
  4. SearchDirectory loops inside each directory, etc.

I will be grateful for that, especially Question 2. Thank you very much

1 Answers1

1

IEnumerables are generally lazy – they are only evaluated/produced when they are enumerated/iterated. Without the foreach loop, it is never iterated, therefore never executed.

It is somewhat odd for your IEnumerable generator function to have side-effects that will only be executed when the enumerable is consumed.

Behind the scenes, functions with yield return statements are transformed into state machines which will produce the output on-demand.

Here's a simpler example show-casing the lazy behavior:

class Program
{
    static void Main()
    {
        Console.Out.WriteLine("0");
        IEnumerable<string> items = Generate("a", "b", "c");
        Console.Out.WriteLine("1");
        foreach (string item in items) {
            Console.Out.WriteLine("for: " + item);
        }
        Console.Out.WriteLine("2");
        foreach (string item in items)
            ;
        Console.Out.WriteLine("3");
    }

    public static IEnumerable<string> Generate(params string[] args)
    {
        foreach (string arg in args) {
            Console.Out.WriteLine("Generate: " + arg);
            yield return arg;
        }
    }
}

Output of the above program:

0
1
Generate: a
for: a
Generate: b
for: b
Generate: c
for: c
2
Generate: a
Generate: b
Generate: c
3

Furthermore, yield return doesn't have to occur inside a loop, it can be used standalone and multiple times in a single function:

class Program
{
    static void Main()
    {
        Console.Out.WriteLine("0");
        IEnumerable<string> items = Generate();
        Console.Out.WriteLine("1");
        foreach (string item in items) {
            Console.Out.WriteLine(item);
        }
        Console.Out.WriteLine("2");
    }

    public static IEnumerable<string> Generate()
    {
        yield return "x";
        yield return "y";
        yield return "z";
    }
}

Output:

0
1
x
y
z
2

And for bonus points, consider the following program:

class Program
{
    static void Main()
    {
        foreach (string item in Generate("a", "b", "c")) {
            Console.Out.WriteLine("for: " + item);
        }
        Generate("42").ToList();
    }
 
    public static IEnumerable<string> Generate(params string[] args)
    {
        foreach (string arg in args) {
            Console.Out.WriteLine("Generating: " + arg);
            yield return arg;
            yield return arg;
            Console.Out.WriteLine("Generated: " + arg);
        }
    }
}

Its output is:

Generating: a
for: a
for: a
Generated: a
Generating: b
for: b
for: b
Generated: b
Generating: c
for: c
for: c
Generated: c
Generating: 42
Generated: 42

Now that we have covered the basics, what your code should probably be doing instead is to get rid of the side effect:

  1. Yield all directories
  2. Iterate those directories and yield their files

Something along the lines of:

static void Main()
{
    string path = "some folder path";
    DirectoryInfo directoryInfo = new DirectoryInfo(path);
    IEnumerable<DirectoryInfo> dirs = SearchDirectory(directoryInfo);
    IEnumerable<string> filePaths = GetFiles(dirs);
    PrintList(filePaths); 
    Console.Read();
}

public static IEnumerable<DirectoryInfo> SearchDirectory(DirectoryInfo directory, int deep = 0)
{
    yield return directory;
    foreach (DirectoryInfo subdirectory in directory.GetDirectories())
    {
        foreach (DirectoryInfo item in SearchDirectory(subdirectory, deep + 1))
        {
            yield return item;
        }
    }
}

public static IEnumerable<string> GetFiles(IEnumerable<DirectoryInfo> dirs) {
  foreach (var dir in dirs)
  {
    foreach (var file in dir.GetFiles())
    {
      yield return file.FullName;
    }
  }
}
knittl
  • 246,190
  • 53
  • 318
  • 364
  • Oh, thanx, for that, now I understand why empty foreach affect the result. There is still the other question? I didn't get how that code is able to loop into all files no matter how deep the folders tree is. – Marwan Ahmed Aug 25 '22 at 19:47
  • @MarwanAhmed because your generator function `SearchDirectory` is calling itself with each subdirectory it encounters. Doing `foreach (var x in …) yield return x;` is basically flat-mapping that nested list. The method is returning a stream of items, until there are no more items left to yield. The problem with your code is that your statemachine/generator function has a side effect, meaning that iterating it will change something outside of the function, which should generally be avoided … – knittl Aug 25 '22 at 19:53
  • … because it makes code hard to understand and debug, as you have discovered :) – knittl Aug 25 '22 at 19:55
  • So the first return yield is for the folder of the passed path itself. then it loops throw each subfolder in that path and loops throw each HierarchicalItem in these subfolders (that HierarchicalItem item doesn't exist in my view) and then it calls itself increasing the depth by 1 and returning an item (which I assume is IEnumerable). Regardless of the files foreach, in this description of what is going on, I don't see any cycle that moves into the next folder and next deepth. – Marwan Ahmed Aug 25 '22 at 20:17
  • @MarwanAhmed the subfolders (and subsubfolders (and subsubsubfolders)) are enumerated/recursed here: `foreach (DirectoryInfo subdirectory in directory.GetDirectories()) foreach (HierarchicalItem item in SearchDirectory(subdirectory, deep + 1))` – it goes over all subdirectories of the current directory (`directory.GetDirectories()`) and then recurses (`SearchDirectory(subdirectory, deep+1))`) into each of them. Each recursive call itself will again return a new generator/statemachine that is lazily evaluated (foreach evaluates/consumes it). – knittl Aug 25 '22 at 20:34
  • Now I started getting it, thanks for your time. – Marwan Ahmed Aug 25 '22 at 20:39