4

I have a FileInfo array with ~200.000 File Entries. I need to find all files which have the same filename. I need as result from every duplicate file the directory name and filename because I want to rename them afterwards.

What I've tried already:

  • Comparing each Entry with the whole List with 2 For "loops" // Bad Idea, this would need hours or even days ^^
  • Try to use Linq Sorting // Because i not used Linq before i have hardship to write the correct Statement, maybe someone can help me :)
Michiel van Oosterhout
  • 22,839
  • 15
  • 90
  • 132
The_Holy_One
  • 321
  • 5
  • 16

2 Answers2

11

Sounds like this should do it:

var duplicateNames = files.GroupBy(file => file.Name)
                          .Where(group => group.Count() > 1)
                          .Select(group => group.Key);

Now would be a very good time to learn LINQ. It's incredibly useful - time spent learning it (even just LINQ to Objects) will pay itself back really quickly.

EDIT: Okay, if you want the original FileInfo for each group, just drop the select:

var duplicateGroups = files.GroupBy(file => file.Name)
                           .Where(group => group.Count() > 1);

// Replace with what you want to do
foreach (var group in duplicateGroups)
{
     Console.WriteLine("Files with name {0}", group.Key);
     foreach (var file in group)
     {
         Console.WriteLine("  {0}", file.FullName);
     }
}
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • My Bad, your Solution works like Charm, but i forogt to mention that i need to know of every duplicated file the Directoy Name + File name because i want to rename them :) Sorry and Thanks. – The_Holy_One Jan 25 '12 at 11:07
  • Excellent, i already knew that LINQ can get really handy sometimes but i never imagined it will be that great. If i am allowed to ask, do you have any Favorite Page to learn LINQ to use? – The_Holy_One Jan 25 '12 at 11:18
  • @user1168998: Not really - but you could look in MSDN, for example. – Jon Skeet Jan 25 '12 at 11:19
2

This should work:

HashSet<string> fileNamesSet = new HashSet<string>();
List<string> duplicates = new List<string>();

foreach(string fileName in fileNames)
{
    if(fileNamesSet.Contains(fileName))
    {
        duplicates.Add(fileName);
    }
    else
    {
        fileNamesSet.Add(fileName);
    }
}

Then duplicates will contain a list of all the duplicate filenames.

Note that since windows file names are case insensitive, you may wish to take this into account by converting all of the filenames to uppercase first using .ToUpperInvariant()

sga101
  • 1,904
  • 13
  • 12
  • 1
    You don't need to do the Contains check first - you can just call `fileNamesSet.Add(fileName)` and check the return value, which will be false for duplicates. – Jon Skeet Jan 25 '12 at 11:20
  • I think the intention is clearer this way - the meaning should be obvious even without knowledge of the HashSet class. – sga101 Jan 25 '12 at 11:32
  • Personally I'd rather just know the APIs I use :) (It's pretty common for a set addition method to return whether or not it actually made the change - it's not like this is a particularly "hidden" bit of information.) Would you use ContainsKey and then the indexer instead of TryGetValue for a dictionary, too? – Jon Skeet Jan 25 '12 at 11:34
  • Point taken. I could just put in a comment instead – sga101 Jan 25 '12 at 11:44