7

I have two Multimaps which have been created from two huge CSV files.

Multimap<String, SomeClassObject> mapOne = ArrayListMultimap.create();
Multimap<String, SomeClassObject> mapTwo = ArrayListMultimap.create();

I have assumed one CSV column to be as a Key and each of the Key has thousands of values associated with it. Data contained within these Multimaps should be same. Now I want to compare the data within these Multimaps and find if any values are different. Here are the two approaches I am thinking of:

Approach One:

Make one big list from the Multimap. This big list will contain a few individual lists. Each of the smaller lists contains a unique value which is the "key" read from Multimap along with its associated values, which will form the rest of that individual list.

ArrayList<Collection<SomeClassObject>> bigList = new ArrayList<Collection<SomeClassObject>>();

Within bigList will be individual small lists A, B, C etc.

I plan on picking individual lists from each bigList of the two files on the basis of checking that individual list from second Multimap contains that "key" element. If it does, then compare both of these lists and find anything that could not be matched.

Approach Two:

Compare both the Multimaps but I am not sure how will that be done.

Which approach should have smaller execution time? I need the operation to be completed in minimum amount of time.

durron597
  • 31,968
  • 17
  • 99
  • 158
user3044240
  • 621
  • 19
  • 33
  • Do you want to know "if" they're equal, or do you want to get a list of the missing values? – durron597 Aug 27 '15 at 16:25
  • "I have two Multimaps which have been created from two huge CSV files." Then why are you doing it in memory? Why not use a database? – Amir Afghani Aug 27 '15 at 16:28
  • @durron597 First check if the key of entry being checked in first multimap is available in second multimap. If it is then check values associated with that key in both multimaps are equal in every aspect. If they differ in any aspect then that record will be considered different and needs to be taken out and dealt with accordingly. – user3044240 Aug 27 '15 at 16:32

2 Answers2

7

Use Multimaps.filterEntries(Multimap, Predicate).

If you want to get the differences between two Multimaps, it's very easy to write a filter based on containsEntry, and then use the filtering behavior to efficiently find all the elements that don't match. Just build the Predicate based on one map, and then filter the other.

Here's what I mean. Here, I'm using Java 8 lambdas, but you can look at the revision history of this post to see the Java 7 version:

public static void main(String[] args) {
  Multimap<String, String> first = ArrayListMultimap.create();
  Multimap<String, String> second = ArrayListMultimap.create();
  
  first.put("foo", "foo");
  first.put("foo", "bar");
  first.put("foo", "baz");
  first.put("bar", "foo");
  first.put("baz", "bar");
  
  second.put("foo", "foo");
  second.put("foo", "bar");
  second.put("baz", "baz");
  second.put("bar", "foo");
  second.put("baz", "bar");
       
  Multimap<String, String> firstSecondDifference =
      Multimaps.filterEntries(first, e -> !second.containsEntry(e.getKey(), e.getValue()));
  
  Multimap<String, String> secondFirstDifference =
      Multimaps.filterEntries(second, e -> !first.containsEntry(e.getKey(), e.getValue()));
  
  System.out.println(firstSecondDifference);
  System.out.println(secondFirstDifference);
}

Output is the element that is not in the other list, in this contrived example:

{foo=[baz]}
{baz=[baz]}

These multimaps will be empty if the maps match.


In Java 7, you can create the predicate manually, using something like this:

public static class FilterPredicate<K, V> implements Predicate<Map.Entry<K, V>> {
  private final Multimap<K, V> filterAgainst;

  public FilterPredicate(Multimap<K, V> filterAgainst) {
    this.filterAgainst = filterAgainst;
  }

  @Override
  public boolean apply(Entry<K, V> arg0) {
    return !filterAgainst.containsEntry(arg0.getKey(), arg0.getValue());
  }
}

Use it as an argument to Multimaps.filterEntries() like this:

Multimap<String, String> firstSecondDifference =
    Multimaps.filterEntries(first, new FilterPredicate(second));

Multimap<String, String> secondFirstDifference =
    Multimaps.filterEntries(second, new FilterPredicate(first));

Otherwise, the code is the same (with the same result) as the Java 8 version above.

Community
  • 1
  • 1
durron597
  • 31,968
  • 17
  • 99
  • 158
  • @user3044240 Glad I could help, I'm curious, are you using the Java 8 lambda version or the Java 7 one in the revision history? If the Java 7 version, I'll pull it out of the revision history for the benefit of future users. – durron597 Aug 28 '15 at 19:59
  • Java 7 version. I think you should put the code back that you removed. It was much easier to understand that. Thank You. – user3044240 Aug 29 '15 at 17:12
  • @user3044240 It's edited back in, in a way that explains both versions. The Java 8 version is significantly less code, I'm going to leave both versions in as we are past Java 7's official End of Life, – durron597 Aug 29 '15 at 17:16
  • It might sound annoying but I just changed code to use lambda version, but it worked well both ways. Thanks once again!! – user3044240 Aug 29 '15 at 19:29
  • 2
    @user3044240 Maybe a future user will find this answer and need the Java 7 version, I'll leave it as it is. – durron597 Aug 29 '15 at 19:31
  • I'm still stuck with Java 7 so this has been very useful – CheeseFerret Sep 02 '15 at 20:56
  • @CheeseFerret If you found my answer useful, feel free to upvote it :) – durron597 Sep 02 '15 at 20:57
2

From the ArrayListMultimap.equals doc:

Compares the specified object to this multimap for equality.

Two ListMultimap instances are equal if, for each key, they contain the same values in the same order. If the value orderings disagree, the multimaps will not be considered equal.

So just do mapOne.equals(mapTwo). You won't have a better execution time by trying to do it yourself.

Community
  • 1
  • 1
Jean Logeart
  • 52,687
  • 11
  • 83
  • 118
  • How do I ensure that order of values of each key of multimap is same? – user3044240 Aug 27 '15 at 16:34
  • 1
    @durron597 I dont but Jean's answer suggests that in order to use his approach of equals method, order will be needed. My concern is to check that whatever values of a key is checked in one map, that should be in the other map's same key's value as well. If they couldnt be found then it means that that record is different in both files and needs to be dealt with. – user3044240 Aug 27 '15 at 16:47
  • 1
    @user3044240 It does check the order, as the doc states. Same way ``arrayList.equals(otherList)`` does. – Jean Logeart Aug 27 '15 at 17:30
  • @user3044240 Read the docs again: "Two ListMultimap instances are equal if, for each key, *they contain the same values in the same order.*" – Louis Wasserman Aug 27 '15 at 17:48
  • @user3044240 why not `SetMultimap`? "Two SetMultimap instances are equal if, for each key, they contain the same values. Equality does not depend on the ordering of keys or values." – Omar Hrynkiewicz Aug 28 '15 at 17:00
  • @OmarHrynkiewicz AFAIU in his case, order matters. – Jean Logeart Aug 28 '15 at 17:26