4

If you had 1,000,000 keys (ints) that mapped to 10,000 values (ints). What would be the most efficient way (lookup performance and memory usage) to implement.

Assume the values are random. i.e there is not a range of keys that map to a single value.

The easiest approach I can think of is a HashMap but wonder if you can do better by grouping the keys that match a single value.

Map<Integer,Integer> largeMap = Maps.newHashMap();
largeMap.put(1,4);
largeMap.put(2,232);
...
largeMap.put(1000000, 4);
Chris
  • 1,299
  • 3
  • 18
  • 34

3 Answers3

4

If the set of keys is known to be in a given range (as 1-1000000 shown in your example), then the simplest is to use an array. The problem is that you need to look up values by key, and that limits you to either a map or an array.

The following uses a map of values to values simply to avoid duplicate instances of equal value objects (there may be a better way to do this, but I can't think of any). The array simply serves to look up values by index:

private static void addToArray(Integer[] array, int key, 
        Integer value, Map<Integer, Integer> map) {

    array[key] = map.putIfAbsent(value, value);
}

And then values can be added using:

Map<Integer, Integer> keys = new HashMap<>();
Integer[] largeArray = new Integer[1000001];

addToArray(largeArray, 1, 4, keys);
addToArray(largeArray, 2, 232, keys);
...
addToArray(largeArray, 1000000, 4, keys);

If new Integer[1000001] seems like a hack, you can still maintain a sort of "index offset" to indicate the actual key associated with index 0 in the array.


And I'd put that in a class:

class LargeMap {

    private Map<Integer, Integer> keys = new HashMap<>();
    private Integer[] keyArray;

    public LargeMap(int size) {
        this.keyArray = new Integer[size];
    }

    public void put(int key, Integer value) {
        this.keyArray[key] = this.keys.putIfAbsent(value, value);
    }

    public Integer get(int key) {
        return this.keyArray[key];
    }
}

And:

public static void main(String[] args) {
    LargeMap myMap = new LargeMap(1000_000);

    myMap.put(1, 4);
    myMap.put(2, 232);
    myMap.put(1000_000, 4);
}
ernest_k
  • 44,416
  • 5
  • 53
  • 99
  • 1
    What does the Map map provide for you if key===value of the map. – Chris Feb 03 '19 at 20:01
  • @Chris I only use it to avoid instantiating `equal` values multiple times (see the `map.putIfAbsent(value, value)` call). As noted in the answer, there's probably a better way to do it that I'm yet to find. – ernest_k Feb 03 '19 at 20:04
  • I don't get it. If you change the `put` implementation to `this.keyArray[key] = value;`, then the result is just a wrapper for an array (which is, in many ways, probably the "best" solution here anyhow). Talking about "intantiating equal values" does not seem to make sense for `Integer`. – Marco13 Feb 03 '19 at 22:48
  • @Marco13 Here's the reason: I want to keep **at most 10000 instances/objects** of the possible values. If you call `put(999999, 12345)`, `12345` is auto-boxed into a new `Integer` object. Okay... But calling `put(999998, 12345)` again will result into another `12345` Integer object. This may be insignificant for `Integer`, but I want to limit the number of value instances. In other words, I'm having the very same object for `12345` wherever the map would have had it as value. That's what `this.keys.putIfAbsent` is helping with) – ernest_k Feb 04 '19 at 05:36
1

I'm not sure if you can optimize much here by grouping anything. A 'reverse' mapping might give you slightly better performance if you want to do lookup by values instead of by key (i.e. get all keys with a certain value) but since you didn't explicitly said that you want to do this I wouldn't go with that approach.

For optimization you can use an int array instead of a map, if the keys are in a fixed range. Array lookup is O(1) and primitive arrays use less memory than maps.

int offset = -1;
int[] values = new int[1000000];
values[1 + offset] = 4;
values[2 + offset] = 232;
// ...
values[1000000 + offset] = 4;

If the range doesn't start at 1 you can adapt the offset.

There are also libraries like trove4j which provide better performance and more efficient storage for this kind of data than than standard collections, though I don't know how they compare to the simple array approach.

kapex
  • 28,903
  • 6
  • 107
  • 121
-1

HashMap is the worst solution. The hash of an integer is itself. I would say a TreeMap if you want an easily available solution. You could write your own specialized tree map, for example splitting the keys into two shorts and having a TreeMap within a Treemap.

Jonathan Rosenne
  • 2,159
  • 17
  • 27