Deduplicating HashMap Values

Question

I'm wondering if anyone knows a good way to remove duplicate Values in a LinkedHashMap? I have a LinkedHashMap with pairs of String and List<String>. I'd like to remove duplicates across the ArrayList's. This is to improve some downstream processing.

The only thing I can think of is keeping a log of the processed Values as I iterate over HashMap and then through the ArrayList and check to see if I've encountered a Value previously. This approach seems like it would degrade in performance as the list grows. Is there a way to pre-process the HashMap to remove duplicates from the ArrayList values?

To illustrate...if I have String1>List1 (a, b, c) String2>List2 (c, d, e) I would want to remove "c" so there are no duplicates across the Lists within the HashMap.

As I understand it, he wants to disallow duplicate `List`...but it's very difficult to tell what exactly he means? — Louis Wasserman, Jan 31 '12 at 22:19
I want to ensure no duplicates across the ArrayList's within the LinkedHashMap. I'll edit the question for clarity. — Jeff, Jan 31 '12 at 22:22
Do you want to dedup each individual ArrayList, or _all_ the ArrayLists at the same time? — Louis Wasserman, Jan 31 '12 at 22:22
Just for clarification: you want the values to be unique across all the lists in the map? So if you have a map `foo => [1, 2, 2, 3], bar => [2, 3, 3, 4, 5]`, the result of this deduplication should be `foo => [1, 2, 3], bar => [4, 5]` ? — millimoose, Jan 31 '12 at 22:23
@Jeff how can you decide which one keeps the unique value. e.g., in Inerdial's example, why does foo retain a 2, but not bar? — user949300, Feb 01 '12 at 00:07
It doesn't matter which one keeps the unique value in this case. My list is a list of xml files and their associated graphics. I just need to improve overall performance by ensuring the graphics in the ArrayList's only download once. — Jeff, Feb 01 '12 at 02:29
@Jeff Well, seems like the most "correct" place to fix that is in the download code that comes after the Map, not in the Map itself. What if you delete the entry for one of the xml files, the one that contains the graphics? Suddenly all the other xml files are orphaned from their graphics. — user949300, Feb 01 '12 at 03:05

score 1 · Answer 1 · answered Jan 31 '12 at 22:16

I believe creating a second HashMap, that can be sorted by values (Alphabetically, numerically), then do a single sweep through the sorted list, to check to see if the current node, is equivalent to the next node, if it is, remove the next one, and keep the increment at the same, so it will remain at the same index of that sorted list.

Or, when you are adding values, you can check to see if it already contains this value.

score 1 · Answer 2 · answered Jan 31 '12 at 22:33

Given your clarification, you want something like this:

class KeyValue {
    public String key;
    public Object value;

    KeyValue(String key, Object value) {
        this.key = key;
        this.value = value;
    }

    public boolean equals(Object o) {
        // boilerplate omitted, only use the value field for comparison
    }

    public int hashCode() {
        return value.hashCode();
    }
}

public void deduplicate() {
    Map<String, List<Object>> items = new HashMap<String, List<Object>>();
    Set<KeyValue> kvs = new HashSet<KeyValue>();

    for (Map.Entry<String, List<Object>> entry : items.entrySet()) {
        String key = entry.getKey();
        List<Object> values = entry.getValue();
        for (Object value : values) {
            kvs.add(new KeyValue(key, value));
        }
        values.clear();
    }

    for (KeyValue kv : kvs) {
        items.get(kv.key).add(kv.value);
    }
}

Using a set will remove the duplicate values, and the KeyValue lets us preserve the original hash key while doing so. Add getters and setters or generics as needed. This will also modify the original map and the lists in it in place. I also think the performance for this should be O(n).

score 0 · Answer 3 · answered Jan 31 '12 at 22:17

I'm assuming you need unique elements (contained in your Lists) and not unique Lists.

If you need no association between the Map's key and elements in its associated List, just add all of the elements individually to a Set.

If you add all of the Lists to a Set, it will contain the unique List objects, not unique elements of the Lists, so you have to add the elements individually.

(you can, of course, use addAll to make this easier)

Louis Wasserman · Answer 4 · 2012-01-31T22:53:37.553

Using Guava:

Map<Value, Key> uniques = new LinkedHashMap<Value, Key>();
for (Map.Entry<Key, List<Value>> entry : mapWithDups.entrySet()) {
  for (Value v : entry.getValue()) {
    uniques.put(v, entry.getKey());
  }
}
ListMultimap<K, V> uniqueLists = Multimaps.invertFrom(Multimaps.forMap(uniques), 
  ArrayListMultimap.create());
Map<K, List<V>> uniqueListsMap = (Map) uniqueLists.asMap(); // only if necessary

which should preserve the ordering of the values, and keep them unique. If you can use a ListMultimap<K, V> for your result -- which you probably can -- then go for it, otherwise you can probably just cast uniqueLists.asMap() to a Map<K, List<V>> (with some abuse of generics, but with guaranteed type safety).

score 0 · Answer 5 · answered Jan 31 '12 at 22:32

So, to clarify... You essentially have K, [V1...Vn] and you want unique values for all V?

public void add( HashMap<String, List> map, HashMap<Objet, String> listObjects, String key, List values)
{
    List uniqueValues= new List();
    for( int i  = 0; i < values.size(); i++ ) 
    {
        if( !listObjects.containsKey( values.get(i) ) )
        {
            listObjects.put( values.get(i), key );
            uniqueValues.add( values.get(i) );
        }
    }
    map.put( key, uniqueValues);
}

Essentially, we have another HashMap that stores the list values and we remove the non-unique ones when adding a list to the map. This also gives you the added benefit of knowing which list a value occurs in.

score 0 · Answer 6 · answered Jan 31 '12 at 22:57

As other have noted, you could check the value as you add, but, if you have to do it after the fact:

static public void removeDups(Map<String, List<String>> in) {
        ArrayList<String> allValues = new ArrayList<String>();
        for (List<String> inValue : in.values())
           allValues.addAll(inValue);
        HashSet<String> uniqueSet = new HashSet<String>(allValues);

        for (String unique : uniqueSet)
            allValues.remove(unique);

        // anything left over was a duplicate
        HashSet<String> nonUniqueSet = new HashSet<String>(allValues);

        for (List<String> inValue : in.values())
           inValue.removeAll(nonUniqueSet);

     }


     public static void main(String[] args) {
        HashMap<String, List<String>> map = new HashMap<String, List<String>>();
        map.put("1", new ArrayList(Arrays.asList("a", "b", "c", "a")));
        map.put("2", new ArrayList(Arrays.asList("d", "e", "f")));
        map.put("3", new ArrayList(Arrays.asList("a", "e")));

        System.out.println("Before");
        System.out.println(map);

        removeDups(map);
        System.out.println("After");
        System.out.println(map);

     }

generates an output of

Before
{3=[a, e], 2=[d, e, f], 1=[a, b, c, a]}
After
{3=[], 2=[d, f], 1=[b, c]}

Now that OP has clarified what he wants this doesn't do what he wants. But what he wants is insane. :-) — user949300, Feb 01 '12 at 00:08

Deduplicating HashMap Values

6 Answers6