2

I have a Collection of type string that can contain any number of elements.

Now i need to find out all those elements that are duplicating and find out only the first occurance of duplicating elements and delete rest.

For ex

 public class CollectionCategoryTitle
    {
        public long CollectionTitleId { get; set; }
        public bool CollectionTitleIdSpecified { get; set; }
        public string SortOrder { get; set; }
        public TitlePerformance performanceField { get; set; }      
        public string NewOrder { get; set; }    
    }

    List<CollectionCategoryTitle> reorderTitles = 
        (List<CollectionCategoryTitle>)json_serializer
            .Deserialize<List<CollectionCategoryTitle>>(rTitles);

Now i need to process this collection in such a way tat it removes duplicates but it must keep the 1st occurance.

EDIT:

I have updated the code and i need to compare on "NewOrder " property

Thanks

Ray
  • 45,695
  • 27
  • 126
  • 169
Amit
  • 6,839
  • 21
  • 56
  • 90

3 Answers3

6

For your specific case:

var withoutDuplicates = reorderTitles.GroupBy(z => z.NewOrder).Select(z => z.First()).ToList();

For the more general case, Distinct() is generally preferable. For example:

        List<int> a = new List<int>();
        a.Add(4);
        a.Add(1);
        a.Add(2);
        a.Add(2);
        a.Add(4);

        a = a.Distinct().ToList();

will return 4, 1, 2. Note that Distinct doesn't guarantee the order of the returned data (the current implementation does seem to return them based on the order of the original data - but that is undocumented and thus shouldn't be relied upon).

mjwills
  • 23,389
  • 6
  • 40
  • 63
  • 1
    Ok, but will it always pick the first occurance of the elements who are duplicating? – Amit Nov 11 '11 at 11:15
  • Yes. You could always run it yourself and see. :) – mjwills Nov 11 '11 at 11:18
  • i need to compare using the property NewOrder in my object? – Amit Nov 11 '11 at 11:25
  • @mjwills: Just FYI, you can initialize your `Collection` in one line: `new Collection {4, 1, 2, 2, 1, 3, 4};` – Otiel Nov 11 '11 at 11:26
  • `Distinct` is preferable in both cases. `GroupBy` will needlessly build a collection holding the items that aren't going to be used, while `Distinct` will throw them away immediately. You just need to call the appropriate `Distinct` overload instead of the version you have here which uses `EqualityComparer.Default` – Jon Hanna Nov 11 '11 at 12:38
  • @JonHanna I disagree. Read the remarks at http://msdn.microsoft.com/en-us/library/bb534501.aspx. Then read the comment at http://msdn.microsoft.com/query/dev10.query?appId=Dev10IDEF1&l=EN-US&k=k(%22SYSTEM.LINQ.ENUMERABLE.DISTINCT%60%601%22). Now, as I point out, it turns out that the current behaviour of Distinct appears to return the data in the same order as the source. But this is not something that can be relied on. – mjwills Nov 11 '11 at 12:44
  • I can't see a change to the enumeration version of `Distinct` that didn't maintain order being anything other than a performance lose (parallel and queryable versions would be another matter). Still, there remains no reason to build your own out of GroupBy rather than out of HashSet. – Jon Hanna Nov 11 '11 at 12:51
  • @JonHanna The documentation is incredibly clear that the returned data is unordered. To rely on behaviour that is *explicitly* undocumented is not a wise move. Your HashSet technique is clever (and around 10% faster), although I would tend to lean towards existing operators rather than write your own. – mjwills Nov 11 '11 at 12:58
  • I'll grant you the point on relying upon documentation, but come on "existing operators"? What's `HashSet.Add()` if not an existing operator, and it's a lot more straight-forward, even when it doesn't give a speed increase (which could be anything from 0 to ∞ depending on the source data's size and frequence of duplicates). – Jon Hanna Nov 11 '11 at 13:06
  • I should have been clearer. My use of the term 'operators' was in the context of LINQ - http://odetocode.com/Articles/739.aspx. My solution allowed the use of only the standard (LINQ) operators. You wrote your own function (DistinctNewOrder) - which is great. I have no problem with it whatsoever. I personally wouldn't recommend the approach (since if I need similar functionality for a different class I'll need to write another function). And the whole point of LINQ is to ease the burden of writing just this kind of code. – mjwills Nov 11 '11 at 13:10
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/4914/discussion-between-mjwills-and-jon-hanna) – mjwills Nov 11 '11 at 13:14
3

Use the Enumerable.Distinct<T>() extension method to do this.

Rune Grimstad
  • 35,612
  • 10
  • 61
  • 76
2

EDIT: mjwills correctly points out that guaranteed ordering is important in the question, so the other two suggestions are not spec-guaranteed to work. Leaving just the one that gives this guarantee.

private static IEnumerable<CollectionCategoryTitle> DistinctNewOrder(IEnumerable<CollectionCategoryTitle> src)
{
  HashSet<string> seen = new HashSet<string>();
  //for one last time, change for different string comparisons, such as
  //new HashSet<string>(StringComparer.CurrentCultureIgnoreCase)
  foreach(var item in src)
    if(seen.Add(item.NewOrder))
      yield return item;
}
/*...*/
var distinctTitles = reorderTitles.DistinctNewOrder().ToList();

Finally, only use .ToList() after the call to DistinctNewOrder() if you actually need it to be a list. If you're going to process the results once and then do no further work, you're better off not creating a list which wastes time and memory.

Jon Hanna
  • 110,372
  • 10
  • 146
  • 251
  • Distinct doesn't guarantee to return the data in the same order as the source data (which is important for the original poster). See http://stackoverflow.com/questions/4109938/how-does-linq-distinct-method-sort/4109969#4109969 for example. – mjwills Nov 11 '11 at 12:46
  • If it has to be resilient against implementation change (can you think of one that wouldn't be perverse for linq2objects - rather than 2sql or plink?), then the third option covers that. – Jon Hanna Nov 11 '11 at 12:58