10

I've got a few big arrays/lists of filenames that start the same. Like this:

C:\Program Files\CCleaner\...
C:\Program Files\Common Files\...
C:\Program Files (x86)\Adobe\...
C:\Program Files (x86)\Common Files\...

I would like to extract the beginning part that they all have in common.
In this case: "C:\Program Files"

How do I do that?

I thought I might have to compare 2 strings at a time and get the same beginning. I don't even know how to do that without comparing each character manually? Then I'll have to compare each string to every other string? Will it be O(n²)? Is there a better, faster way?

Edit: Is there also a way without Linq?

Bitterblue
  • 13,162
  • 17
  • 86
  • 124
  • In other words, the reason you want to do this may give better insight into the actual problem. Sometimes, there is a deeper underlying problem. – D. Ben Knoble Jun 22 '15 at 11:47

3 Answers3

12

Quick shot:

List<string> strings = ...;
var minimumLength = strings.Min(x => x.Length);
int commonChars;
for(commonChars = 0; commonChars < minimumLength; commonChars++)
{
  if (strings.Select(x => x[commonChars]).Distinct().Count() > 1)
  {
    break;
  }
}
return strings[0].Substring(0, commonChars);

OR

var minimumLength = strings.Min(x => x.Length);
Enumerable
  .Range(0, minimumLength)
  .Count(i => strings.All(y => y[i] == strings[0][i]));

Without Linq:

List<string> strings = ...;
var minimumLength = strings.Min(x => x.Length);
int commonChars;
for(commonChars = 0; commonChars < minimumLength; commonChars++)
{
  foreach(var str in strings)
  {
    if (str[commonChars] != strings[0][commonChars])
    {
      break;
    }
  }
}
return strings[0].Substring(0, commonChars);

There are a couple of other solutions.

Stefan Steinegger
  • 63,782
  • 15
  • 129
  • 193
  • 5
    I think it's a typo and should be `strings.Select`...? – petelids Jun 22 '15 at 11:54
  • Danke! The `Without Linq` part has still a few bugs I think, but I got it running correctly for me. And the second line uses Linq still. – Bitterblue Jun 22 '15 at 12:12
  • Is the complexity of this search pattern O(n)? It's quite a good idea to search vertically instead of horizontally. – Bitterblue Jun 22 '15 at 12:16
  • @StefanSteinegger Why was your first thought to use Linq? However useful abstractions like Linq and Enumerable are in most contexts, I get the feeling they are the first people think about just out of habit. Now that the dust has settled, which of the three options is the fastest? Can any solution other than reading the raw string as an array be at an advantage? Sorry for nesting questions, but I can't make another thread for this – Emilio Martinez Jun 22 '15 at 12:18
  • @EmilioMartinez In my opinion Linq is always a little slower than raw Linqless code (I'll run a test). But I always prefer Linqless because of readability. – Bitterblue Jun 22 '15 at 12:28
  • 1
    @EmilioMartinez With file list from `C\Windows\System32\` (2511 files) : Quick shot: 34 sec, OR: 22 sec, Without Linq: 15 sec – Bitterblue Jun 22 '15 at 12:45
  • @Bitterblue: You "prefer Linqless because of readability" ??? I've never heard someone saying something like this. – Stefan Steinegger Jun 22 '15 at 13:03
  • @StefanSteinegger My eyes aren't Linq-trained, that's the difference. Linqless reads better for me. Sorry, if that's weird. – Bitterblue Jun 22 '15 at 13:10
  • @Bitterblue: Ok, I understand this. I can only recommend to train your eyes for Linq. It reads much more naturally. Thinks about a list of users, where you want to take the first where the name starts with letter A: `users.Where(x => x.Name.StartsWith("A")).First()`. Isn't it natural to read? Looping though lists, conditionally putting things into variables and finally break the loop isn't really easy to read and - because its much more code - error prone and even slower most of the time. – Stefan Steinegger Jun 23 '15 at 06:41
4

Another Linq solution:

var strings = new List<string> {@"C:\Program Files\CCleaner\...", @"C:\Program Files\Common Files\...", 
                                @"C:\Program Files (x86)\Adobe\...", @"C:\Program Files (x86)\Common Files\..."};

var common = new string(strings.Select(str => str.TakeWhile((c, index) => strings.All(s => s[index] == c)))
                               .FirstOrDefault().ToArray());

Console.WriteLine(common); // C:\Program Files
w.b
  • 11,026
  • 5
  • 30
  • 49
1

If you have a very big list, the best is to sort the string, check the number of characters the first and last string has.

I cant really prove that it works, but intuitively, it does. All the middle ones will need to have the same prefix to be sorted that way.

using System;
using System.IO;
using System.Collections.Generic;
namespace StringSameStart
{
    class MainClass
    {
        public static void Main(string[] args)
        {
            Console.WriteLine("Hello World!");

            var files = Directory.GetFiles("/Users/ibrar", "*", SearchOption.AllDirectories);
            foreach (var file in files)
            {
                Console.WriteLine("file : " + file);
            }

            Array.Sort(files);
            var first = files[0];
            var last = files[files.Length - 1];

            List<char> list = new List<char>();

            for (int ctr = 0; ctr < files[0].Length; ctr++)
            {
                if (first[ctr] != last[ctr])
                {
                    break;
                }

                Console.WriteLine("Same : " + first[ctr]);
                list.Add(first[ctr]);
            }


            Console.WriteLine("Match : " + new string(list.ToArray()));
        }
    }
}
mrwaim
  • 1,841
  • 3
  • 20
  • 29
  • I should of used a StringBuilder instead of List – mrwaim Jun 22 '15 at 12:10
  • Owh, and it doesnt handle the case when there are no matches, and second string is shorter – mrwaim Jun 22 '15 at 12:16
  • On a quick try, it seems to work in my project. But how much was the complexity for sorting C# lists? – Bitterblue Jun 22 '15 at 12:23
  • http://stackoverflow.com/questions/9612167/what-is-time-complexity-of-net-list-sort - "On average, this method is an O(n log n) operation, where n is Count; in the worst case it is an O(n ^ 2) operation." – mrwaim Jun 22 '15 at 13:27