0

I'm looking for an optimized algorithm that give an array (or list) of a struct that I wrote and remove duplicated elements and return it.
I know I can do it by a simple algorithm with complexity of O(n^2); But I want a better algorithm.

Any help will be appreciated.

Zhr Saghaie
  • 1,031
  • 2
  • 16
  • 41

4 Answers4

3

This runs in close to O(N) time:

var result = items.Distinct().ToList();

[EDIT]

Since there is no documented proof from Microsoft that it is O(N) time, I did some timings with the following code:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

namespace Demo
{
    class Program
    {
        private void run()
        {
            test(1000);
            test(10000);
            test(100000);
        }

        private void test(int n)
        {
            var items = Enumerable.Range(0, n);
            new Action(() => items.Distinct().Count())
                .TimeThis("Distinct() with n == " + n + ": ", 10000);
        }

        static void Main()
        {
            new Program().run();
        }
    }

    static class DemoUtil
    {
        public static void TimeThis(this Action action, string title, int count = 1)
        {
            var sw = Stopwatch.StartNew();

            for (int i = 0; i < count; ++i)
                action();

            Console.WriteLine("Calling {0} {1} times took {2}",  title, count, sw.Elapsed);
        }
    }
}

The results are:

Calling Distinct() with n == 1000:   10000 times took 00:00:00.5008792
Calling Distinct() with n == 10000:  10000 times took 00:00:06.1388296
Calling Distinct() with n == 100000: 10000 times took 00:00:58.5542259

The times are increasing approximately linearly with n, at least for this particular test, which indicates that an O(N) algorithm is being used.

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
3

For practical use LINQ's Distinct is the simplest solution. It uses a hashtable based approach, probably very similar to the following algorithm.

If you're interested in how such an algorithm would look like:

IEnumerable<T> Distinct(IEnumerable<T> sequence)
{
    var alreadySeen=new HashSet<T>();
    foreach(T item in sequence)
    {
        if(alreadySeen.Add(item))// Add returns false if item was already in set
            yield return;
    }
}

If there are d distinct elements and n total elements then this algorithm will take O(d) memory and O(n) time.

Since this algorithm uses a hashset, it requires well distributed hashes to achieve O(n) runtime. If the hashes suck, the runtime can degenerate to O(n*d)

CodesInChaos
  • 106,488
  • 23
  • 218
  • 262
2

You can sort the array in O(NlogN) time, and compare adjacent elements to erase duplicate elements.

Aravind
  • 3,169
  • 3
  • 23
  • 37
2

You can use HashSet with complexity of O(N):

List<int> RemoveDuplicates(List<int> input)
{
    var result = new HashSet<int>(input);
    return result.ToList();
}

But it will increase memory usage.

odinmillion
  • 639
  • 9
  • 10