4

A Rec object has a member variable called tag which is a String.

If I have a List of Recs, how could I de-dupe the list based on the tag member variable?

I just need to make sure that the List contains only one Rec with each tag value.

Something like the following, but I'm not sure what's the best algorithm to keep track counts, etc:

private List<Rec> deDupe(List<Rec> recs) {

    for(Rec rec : recs) {

         // How to check whether rec.tag exists in another Rec in this List
         // and delete any duplicates from the List before returning it to
         // the calling method?

    }

    return recs;

}
Daniel K.
  • 105
  • 2
  • 5
  • If you are asking how do I remove duplicates from a list, this has been asked many times before; http://stackoverflow.com/search?q=[java]+removing+duplicates – Qwerky Nov 03 '10 at 14:52
  • possible duplicate of [Remove duplicates from a list](http://stackoverflow.com/questions/2849450/remove-duplicates-from-a-list) – Corbin March Nov 03 '10 at 14:55

5 Answers5

6

Store it temporarily in a HashMap<String,Rec>.

Create a HashMap<String,Rec>. Loop through all of your Rec objects. For each one, if the tag already exists as a key in the HashMap, then compare the two and decide which one to keep. If not, then put it in.

When you're done, the HashMap.values() method will give you all of your unique Rec objects.

Erick Robertson
  • 32,125
  • 13
  • 69
  • 98
5

Try this:

private List<Rec> deDupe(List<Rec> recs) {

    Set<String> tags = new HashSet<String>();
    List<Rec> result = new ArrayList<Rec>();

    for(Rec rec : recs) {
        if(!tags.contains(rec.tags) {
            result.add(rec);
            tags.add(rec.tag);
        }
    }

    return result;
}

This checks each Rec against a Set of tags. If the set contains the tag already, it is a duplicate and we skip it. Otherwise we add the Rec to our result and add the tag to the set.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
Alan Geleynse
  • 24,821
  • 5
  • 46
  • 55
  • you can make it simpler by using the return value of Set.add: if (tags.add(rec.tags)) result.add(rec) – Tom Nov 03 '10 at 15:26
  • 1
    Yes. This is what `Set` is for ... having a collection of distinct objects. – heez Aug 09 '16 at 19:00
1

This becomes easier if Rec is .equals based on its tag value. Then you could write something like:

private List<Rec> deDupe( List<Rec> recs )
{
    List<Rec> retList = new ArrayList<Rec>( recs.size() );
    for ( Rec rec : recs )
    {
        if (!retList.contains(rec))
        {
            retList.add(rec);
        }
    }
    return retList;
 }
Reese Moore
  • 11,524
  • 3
  • 24
  • 32
  • 4
    Couldn't you use Set.addAll(recs)? – Rich Nov 03 '10 at 14:52
  • @Rich - I tried using a HashSet but I was able to add multiple objects with different `tag` values so the uniqueness of the set seems to be based on some other attribute of the `Rec` object but I'm not sure what it is. – Daniel K. Nov 03 '10 at 14:55
  • @Rich: Yes, that would work (probably better than my submission) – Reese Moore Nov 03 '10 at 14:55
  • @Reese: Is there any potential danger of overriding the `equals` method to base it on the `tag` value? – Daniel K. Nov 03 '10 at 14:55
  • @Daniel K: Unless you have some other semantic definition for equals that is used you *most definitely should* provide a semantic meaning for .equals – Reese Moore Nov 03 '10 at 14:56
  • 2
    @Daniel K: the HashSet relies on correct overriding of equals and hashCode - that will be why it didn't work before – Rich Nov 03 '10 at 15:03
0

If you don't care about shuffling the data around (i.e you have a small list of small objects), you can do this:

private List<T> deDupe(List<T> thisListHasDupes){
    Set<T> tempSet = new HashSet<T>();
    for(T t:thisListHasDupes){
        tempSet.add(t);
    }
    List<T> deDupedList = new ArrayList<T>();
    deDupedList.addAll(tempSet);
    return deDupedList;
}

Remember that implmenations of Set are going to want a consistent and valid equals operator. So if you have a custom object make sure that's taken care of.

PHY6
  • 391
  • 3
  • 12
0

I would do that with the google collections. You can use the filter function, with a predicate that remember previous tags, and filters out Rec's with tag that has been there before. Something like this:

private Iterable<Rec> deDupe(List<Rec> recs) 
{
    Predicate<Rec> filterDuplicatesByTagPredicate = new FilterDuplicatesByTagPredicate();
    return Iterables.filter(recs, filterDuplicatesByTagPredicate);
}

private static class FilterDuplicatesByTagPredicate implements Predicate<Rec>
{
    private Set<String> existingTags = Sets.newHashSet();

    @Override
    public boolean apply(Rec input)
    {
        String tag = input.getTag();
        return existingTags.add(tag);
    }
}

I slightly changed the method to return Iterable instead of List, but ofcourse you change that if that's important.

duduamar
  • 3,816
  • 7
  • 35
  • 54
  • The javadoc for Predicate strongly advises against having predicates where apply() has any observable side effect. See this question: http://stackoverflow.com/questions/4036326/google-collections-distinct-predicate/4036416#4036416 – Michael D Nov 03 '10 at 15:00
  • Can you please explain why? What's the pitfall here? – duduamar Nov 03 '10 at 15:30
  • Side effects make code harder to understand, particularly when they're hidden in unexpected places. The standard expectation for a `Predicate` is that it is a fixed logical predicate for which each element that is given to it is evaluated in the same way. With this predicate, the criteria for evaluation changes continually during the course of a single call to `filter`. The predicate is, of course, not reusable either... using it again would cause it to filter out every element of the original list. This violates expectations.. a `Map` is a more appropriate solution to this problem. – ColinD Nov 03 '10 at 16:08