How do I inject a dependency representing a collection of folders?

Question

What is the best way to dependency inject folders?

I have a class that needs three folders. The goal is to gather files from a subfolder structure (a folder containing multiple folders into which the files are sorted) and write it to two another subfolder structures, may it be via an abstraction or not, it needs the folders.

Specifically I want to split data for machine learning algorithms into training and test data while the subfolders represent different categories for images which are going to be classified.

So, what is the best way to inject these folders while still having easy testable code? Should I just pass a string? Should I pass a FileInfo object? Should I build a Interface, which represents a wrapper for the folder structure? What is the best way to handle this?

A C# approach was the best, but is not necessary.

Let me know, if information is missing.

The question in its current state is unclear as it is incomplete. Read [ask] and then provide a [mcve] that can be used to reproduce your problem, allowing a better understanding of what is being asked. — Nkosi, Jan 18 '18 at 21:37
This covers a range of topics. It all depends on how you want to work with the dependencies. — Nkosi, Jan 18 '18 at 21:40
what exactly do you mean? Actually I only want to do the explained task as well as being able to test it properly. What information is missing? — JFFIGK, Jan 18 '18 at 21:41
Does the class actually need folders, or does it need something from the folders? If a class depends on folders then it depends on the file system, not on an abstraction. Find a way to abstract what the class actually needs, and then you'll know what to inject. — Scott Hannen, Jan 18 '18 at 21:44
Since the goal is to gather files from a subfolder structure (a folder containing multiple folders into which the files are sorted) and write it to two another subfolder structures, may it be via an abstraction or not, it needs folders. — JFFIGK, Jan 18 '18 at 21:49
You mentioned that if that is the case, it depends on the file system. Does this make the code not testable? I am really eager to supply all the information needed. If I could solve the problem by myself I would not ask here. — JFFIGK, Jan 18 '18 at 21:58
Whether or not the question precisely fits the stackoverflow format, it's a good question. Wanting something to be testable is good. Trying to create the right abstraction for something is also good. — Scott Hannen, Jan 18 '18 at 22:24

score 2 · Answer 1 · answered May 13 '19 at 15:53

Representing file system operations without actually depending on the file system is easier now using System.IO.Abstractions. The pattern is similar to how we can write code that depends on HttpContextWrapper instead of directly on HttpContext, which allows us to mock HttpContext.

Using these classes you could inject IEnumerable<System.IO.Abstractions.DirectoryInfoWrapper>, and at runtime each directory is a "real" DirectoryInfo, created like this:

var directory = new DirectoryInfo("c:\folder");
var wrapper = new DirectoryInfoWrapper(new FileSystem(), directory);

The DirectoryInfoWrapper behaves just like DirectoryInfo except that it also returns abstractions. For example, wrapper.GetFiles() returns IFileInfo[] instead of FileInfo[]. So all of our code would be written to depend on the abstractions. That's fine, because the abstractions have the same properties and methods as the concrete classes.

Or, rather than injecting actual directories, you might want something like this:

public interface IDirectoryProvider
{
    IEnumerable<DirectoryInfoWrapper> GetDirectories(string someInput);
}

In either case this allows you to unit test using mocked directories which, if necessary, contain more mocked directories and even mocked files. I generally don't like mocks that return mocks. You could even have your mock directories return real files contained in your test project if that's easier than creating mock files. At the very least it provides some options that weren't available before the abstractions.

Hair-splitting detail: One could argue that these aren't really "abstractions" because, by design, they are exact representations of concrete classes. You could use them to represent something completely different, like database storage, but you probably wouldn't, and they wouldn't be very good abstractions because it would force you to map fake paths to records.

That being said, I tried to imagine what I would call the namespace instead of System.IO.Abstractions and I can't think of anything better. You could call them "mocks" but then it would be confusing to see them in production code.

Scott Hannen · Answer 2 · 2018-01-18T22:43:03.617

Regardless of how you write it, you can't test a class that moves files between folders without using actual files and folders. But as far as representing it, maybe something like this:

public interface ISomethingRepository
{
    IEnumerable<ThingWithDataInIt> GetThings();
    void SaveAsTraining(ThingWithDataInIt thing);
    void SaveAsTest(ThingWithDataInIt thing);
}

The purpose is that whatever depends on this really wants the things in the files, and it wants to know that having inspected an item, it can save it with either the "training" data or the "test" data.

The implementation can be file-system based. I'm just making up details for illustration. I don't know what's in these files, whether it even needs to be deserialized, etc. Perhaps for each file you have to parse the lines and return a collection of things. This is for the sake of illustration.

public class FileSystemSomethingRepository : ISomethingRepository
{
    private readonly string _sourceDirectoryPath;
    private readonly string _trainingDirectoryPath;
    private readonly string _testDirectoryPath;

    public FileSystemSomethingRepository(string sourceDirectoryPath, 
        string trainingDirectoryPath, 
        string testDirectoryPath)
    {
        _sourceDirectoryPath = sourceDirectoryPath;
        _trainingDirectoryPath = trainingDirectoryPath;
        _testDirectoryPath = testDirectoryPath;
    }

    public IEnumerable<ThingWithDataInIt> GetThings()
    {
        var filePaths = Directory.GetFiles(_sourceDirectoryPath);
        foreach (var filePath in filePaths)
        {
            var fileContent = File.ReadAllText(filePath);
            var deserialized = JsonConvert.DeserializeObject<ThingWithDataInIt>(fileContent);
            yield return deserialized;
        }
    }

    public void SaveAsTraining(ThingWithDataInIt thing)
    {
        // serialize it, write it to the folder
    }

    public void SaveAsTest(ThingWithDataInIt thing)
    {
        // serialize it, write it to the folder
    }
}

The interface is easy to mock, and will keep whatever class that depends on this from knowing about whether the data comes from a file system, how it's serialized/deserialized, etc. Hiding those details from the consumer is what makes it an abstraction and enables you to gain the benefits of dependency injection.

Something else that will help you design the right abstraction is to write your interface describing exactly what it is that your want the the class that depends on it to do with it. In other words, write the interface from the perspective of the consumer. That way you're not trying to imagine a solution while simultaneously trying to figure out if it will do what you want. You might need to make some adjustments, but first you're figuring out what your class needs by writing the interface. Then you figure out how to implement it.

That also enables you to focus on the most important task first. You want to write a machine learning algorithm, not something that reads from files. You can just write the interface that represents what your class needs and move on as if the implementation already existed. You get to focus on what you care more about it, and you can even test it. Then you can come back to writing implementation details like this. Or if you're working on a team you can give someone else the interface and ask them to implement it.

How do I inject a dependency representing a collection of folders?

2 Answers2