4

I want to index all my music files and store them in a database. I have this function that i call recusively, starting from the root of my music drive.

i.e.

start > ReadFiles(C:\music\);

ReadFiles(path){
   foreach(file)
      save to index;

   foreach(directory)
      ReadFiles(directory);
}

This works fine, but while running the program the amount of memory that is used grows and grows and.. finally my system runs out of memory.

Does anyone have a better approach that doesnt need 4GB of RAM to complete this task?

Best Regards, Tys

Tys
  • 3,592
  • 9
  • 49
  • 71
  • 4
    Please post the actual code. There's nothing fundamentally wrong with your approach. – John Kugelman Nov 21 '10 at 21:03
  • 3
    Recursion is not limited by available memory, but by the size of the stack, so if you run out of memory, it sounds like you're holding on to data for too long. – Brian Rasmussen Nov 21 '10 at 21:04
  • I can't _imagine_ that you have enough music to require that much space. Are you sure you're not getting a stack overflow or have entered an endless loop at some point? – Michael Todd Nov 21 '10 at 21:05
  • 2
    Explain "save to index". Is it actually storing all the bytes from the file on disk inside the running program? – Brian Nov 21 '10 at 21:07

5 Answers5

10

Alxandr's queue based solution should work fine.

If you're using .NET 4.0, you could also take advantage of the new Directory.EnumerateFiles method, which enumerates files lazily, without loading them all in memory:

void ReadFiles(string path)
{
    IEnumerable<string> files =
        Directory.EnumerateFiles(
            path,
            "*",
            SearchOption.AllDirectories); // search recursively

    foreach(string file in files)
        SaveToIndex(file);
}
Thomas Levesque
  • 286,951
  • 70
  • 623
  • 758
2

Did you check for the . and .. entries that show up in every directory except the root?

If you don't skip those, you'll have an infinite loop.

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • 1
    They don't show up in either `Directory.GetFiles` or `Directory.GetDirectories` though. _Normally_, you shouldn't run into this when working in .net. – configurator Nov 21 '10 at 21:44
1

You can implement this as a queue. I think (but I'm not sure) that this will save memory. At least it will free up your stack. Whenever you find a folder you add it to the queue, and whenever you find a file you just read it. This prevents recursion.

Something like this:

Queue<string> dirs = new Queue<string>();
dirs.Enqueue("basedir");
while(dirs.Count > 0) {
    foreach(directory)
        dirs.Enqueue(directory);
    ReadFiles();
}
Alxandr
  • 12,345
  • 10
  • 59
  • 95
  • 1
    This won't save memory. By default, the stack is only 1 megabyte before you StackOverflow. If he's getting OutOfMemory, there is a different problem. – Brian Nov 21 '10 at 21:17
  • Thanks for all answers. Everything helped a bit. I've implemented a queue mechanism, did some extra checks for directories that shouldn't be indexed and while doing so i've found that my NHibernate needed some fine tuning as well. Now indexing over 1TB with ease. – Tys Nov 22 '10 at 19:02
0

Beware, though, that EnumerateFiles() will stop running if you don't have access to a file or if a path is too long or if some other exception occurs. This is what I use for the moment to solve those problems:

public static List<string> getFiles(string path, List<string> files)
{
    IEnumerable<string> fileInfo = null;
    IEnumerable<string> folderInfo = null;
    try
    {
        fileInfo = Directory.EnumerateFiles(str);
    }
    catch
    {

    }
    if (fileInfo != null)
    {
        files.AddRange(fileInfo);
        //recurse through the subfolders
        fileInfo = Directory.EnumerateDirectories(str);
        foreach (string s in folderInfo)
        {
            try
            {
                getFiles(s, files);
            }
            catch
            {

            }
        }
    }
    return files;
}

Example use:

List<string> files = new List<string>();
files = folder.getFiles(path, files);

My solution is based on the code at this page: http://msdn.microsoft.com/en-us/library/vstudio/bb513869.aspx.

Update: A MUCH faster method to get files recursively can be found at http://social.msdn.microsoft.com/Forums/vstudio/en-US/ae61e5a6-97f9-4eaa-9f1a-856541c6dcce/directorygetfiles-gives-me-access-denied?forum=csharpgeneral. Using Stack is new to me (I didn't even know it existed), but the method seems to work. At least it listed all files on my C and D partition with no errors.

matsolof
  • 245
  • 2
  • 6
0

It could be junction folders wich leads to infinite loop when doing recursion but i am not sure , check this out and see by yourself . Link: https://learn.microsoft.com/en-us/windows-server/administration/windows-commands/mklink

Ion N.
  • 1
  • 1