0

(Title and question have been significantly changed, as the none of the important parts ended up being relevant to the problem)

I have a generated file tree of a hard drive, and I'm creating a function to highlight every instance of an extension in the file tree. For some reason iterating over any duplicate file tree other than the one created during the scan can take at least twice as long. Please note that I am not trying to iterate over the file tree during the scan.

What exactly is causing the slowdown? List<FileNode> seems to be the culprit, but I'm not sure what internal mechanism is at fault.

I've created a gist with 4 files to enumerate a file tree and show the inconsistencies in iteration times: FileTreeIterationSpeedTest

Performance for drive with 2m files and 200k directories:

enter image description here

Output from gist:

Scanning...
RAM Used:   300.6 MB
   Bytes:   443.7 GB
   Files:  1,925,131
 Folders:    156,311
Progress:     100.0%
Duration:   00:00:17

Scan complete!
Duplicating file tree...
Duplication complete!
RAM Used: 311.4 MB

Iterating: 1000
Scanned Tree: 00:03.857
  Duped Tree: 00:01.409
Duped Tree is 173.6% faster

Press any key to continue...

Relevant Code from FileNode.cs:

public class FileNode {
    public enum FileNodeType {
        Root,
        Directory,
        FileCollection,
        File,
    }

    private readonly List<FileNode> children = new List<FileNode>();
    private FileNode fileCollection;

    public FileNode Parent { get; private set; }
    public FileNodeType Type { get; }
    public long Size { get; private set; }
    public string Extension { get; } = string.Empty;
    public string Name { get; }

    // File Collection
    private FileNode() {
        Type = FileNodeType.FileCollection;
        Name = "<Files>";
    }

    // Root Node
    public FileNode(string drivePath) {
        Type = FileNodeType.Root;
        Name = drivePath;
    }

    // File or Directory Node
    public FileNode(Win32FindData find) {
        if (!find.IsDirectory) {
            Type = FileNodeType.File;
            Extension = Path.GetExtension(find.cFileName);
        }
        else {
            Type = FileNodeType.Directory;
        }
        Name = find.cFileName;
        Size = find.Size;
    }

    // Duplicate Tree \w Parent
    public FileNode(FileNode root) : this(root, null) {
    }

    // Duplicate Tree \w Parent
    private FileNode(FileNode file, FileNode parent) {
        Parent = parent;
        Type = file.Type;
        Size = file.Size;
        Extension = file.Extension;
        Name = file.Name;

        int count = file.children.Count;
        children = new List<FileNode>(count);
        for (int i = 0; i < count; i++)
            children.Add(new FileNode(file[i], this));
    }

    public void AddChild(FileNode item) {
        if (item.Type == FileNodeType.File && Type != FileNodeType.FileCollection) {
            if (fileCollection == null)
                fileCollection = new FileNode();
            fileCollection.AddChild(item);
        }
        else {
            children.Add(item);
            item.Parent = this;
        }
    }

    public bool IsLeaf => children.Count == 0;

    public int Count => children.Count;

    public FileNode this[int index] => children[index];
}
trigger_segfault
  • 554
  • 1
  • 6
  • 23
  • 1
    When you were benchmarking this, was the data exactly the same, so it would be highlighting the same number of entries? Have you tried removing the UI interaction part? – Jon Skeet Aug 13 '18 at 16:23
  • 1
    Could you post a [.Net Fiddle](https://dotnetfiddle.net/) or similar with some test data so that there's a [MCVE]? That'd help a lot! –  Aug 13 '18 at 16:24
  • @DaisyShipton yes, as seen in the constructor for `TreemapItem` it's constructed from the `RootNode` and then recursively builds a copy of the tree with minimal data. I will try the No-UI approach with the .Net Fiddle write-up that @AndyJ suggested. @AndyJ let me try and write something up. The classes are a mess right now but I should be able to cobble something together. (A few-day old implementation is present here https://git.io/fNFzo but a lot has been optimized and changed since then) – trigger_segfault Aug 13 '18 at 16:29
  • Update, I first attempted manually cloning the `FileNode` tree using the same method as with `TreemapItem` and its speed was just as fast, (and sometimes 1 small percentage faster) than the one loaded into the UI. I need to go more in depth because the VisualInfo nested class contains a few references that will need to be populated. – trigger_segfault Aug 13 '18 at 16:59
  • I've included an update in my question, as said, I think this problem may be very specific to the method for loading the file tree, although it's strange because there isn't any obvious thing sticking out that would cause the issue. – trigger_segfault Aug 13 '18 at 18:10
  • @AndyJ I have created a small*-ish* gist project to reproduce the results. – trigger_segfault Aug 13 '18 at 22:29

0 Answers0