Regardless of how you write it, you can't test a class that moves files between folders without using actual files and folders. But as far as representing it, maybe something like this:
public interface ISomethingRepository
{
IEnumerable<ThingWithDataInIt> GetThings();
void SaveAsTraining(ThingWithDataInIt thing);
void SaveAsTest(ThingWithDataInIt thing);
}
The purpose is that whatever depends on this really wants the things in the files, and it wants to know that having inspected an item, it can save it with either the "training" data or the "test" data.
The implementation can be file-system based. I'm just making up details for illustration. I don't know what's in these files, whether it even needs to be deserialized, etc. Perhaps for each file you have to parse the lines and return a collection of things. This is for the sake of illustration.
public class FileSystemSomethingRepository : ISomethingRepository
{
private readonly string _sourceDirectoryPath;
private readonly string _trainingDirectoryPath;
private readonly string _testDirectoryPath;
public FileSystemSomethingRepository(string sourceDirectoryPath,
string trainingDirectoryPath,
string testDirectoryPath)
{
_sourceDirectoryPath = sourceDirectoryPath;
_trainingDirectoryPath = trainingDirectoryPath;
_testDirectoryPath = testDirectoryPath;
}
public IEnumerable<ThingWithDataInIt> GetThings()
{
var filePaths = Directory.GetFiles(_sourceDirectoryPath);
foreach (var filePath in filePaths)
{
var fileContent = File.ReadAllText(filePath);
var deserialized = JsonConvert.DeserializeObject<ThingWithDataInIt>(fileContent);
yield return deserialized;
}
}
public void SaveAsTraining(ThingWithDataInIt thing)
{
// serialize it, write it to the folder
}
public void SaveAsTest(ThingWithDataInIt thing)
{
// serialize it, write it to the folder
}
}
The interface is easy to mock, and will keep whatever class that depends on this from knowing about whether the data comes from a file system, how it's serialized/deserialized, etc. Hiding those details from the consumer is what makes it an abstraction and enables you to gain the benefits of dependency injection.
Something else that will help you design the right abstraction is to write your interface describing exactly what it is that your want the the class that depends on it to do with it. In other words, write the interface from the perspective of the consumer. That way you're not trying to imagine a solution while simultaneously trying to figure out if it will do what you want. You might need to make some adjustments, but first you're figuring out what your class needs by writing the interface. Then you figure out how to implement it.
That also enables you to focus on the most important task first. You want to write a machine learning algorithm, not something that reads from files. You can just write the interface that represents what your class needs and move on as if the implementation already existed. You get to focus on what you care more about it, and you can even test it. Then you can come back to writing implementation details like this. Or if you're working on a team you can give someone else the interface and ask them to implement it.