0

I want to have one DateWrapper - representing a date (built for Hibernate persistance, but this is another story) - at most existing at the same time for the same date.

I'm a bit confused about collisions and good keys for hashing. I'm writing a factory for a DateWrapper object, and I thought to use the milliseconds of the parsed date as the key as I've seen others doing. But, what happens if there is a collision?. Milliseconds are always different from one another, but the internal table may be smaller than the Long that could exist. And once the hash map has a collision, it uses the equals, but how can it distinguish two different object from my Long? Maybe, it's the put method to drop (overwrite) some value I'd like to insert... So, is this code safe, or is it bugged??

package myproject.test;

import java.util.HashMap;
import java.util.Map;

import org.joda.time.DateTime;
import org.joda.time.format.DateTimeFormat;
import org.joda.time.format.DateTimeFormatter;

import myproject.utilities.DateWrapper;

public class DateWrapperFactory {

    static Map <Long, DateWrapper> cache = new HashMap<Long, DateWrapper>();
    static DateTimeFormatter parser =
        DateTimeFormat.forPattern("yyyy-MM-dd");

    static DateWrapperFactory instance = new DateWrapperFactory();

    private DateWrapperFactory() {
    }

    public static DateWrapperFactory getInstance() {
        return instance;
    }


    public static DateWrapper get(String source) {
        DateTime d = parser.parseDateTime(source);
        DateWrapper dw = cache.get(d.getMillis());
        if (dw != null) {
            return dw;
        } else {
            dw = new DateWrapper(d);
            cache.put(d.getMillis(), dw);
            return dw;
        }
    }

}

package myproject.test;

import org.joda.time.DateTime;

public class DateWrapper {

    private DateTime date;

    public DateWrapper(DateTime dt) {
        this.date = dt;
    }

}
cdarwin
  • 4,141
  • 9
  • 42
  • 66
  • Isn't this equivalent to a Set, with DateTime.equals implemented by way of the getMillis() ? – extraneon Nov 15 '10 at 20:40
  • 1
    @extraneon: It's not equivalent. The idea here (I think) is to implement a single canonical/interned instance for each `DateTime` being used, which requires the ability to get at that instance without iterating... this requires a map. – ColinD Nov 15 '10 at 20:47

4 Answers4

0

With HashMap, you can store only one entry under any given key value (e.g. Long in your case).

On a side note, if there is any chance of concurrency, you may want to use a ConcurrentHashMap and putIfAbsent() instead of non-atomic get/if/put calls.

Eugene Kuleshov
  • 31,461
  • 5
  • 66
  • 67
  • 1
    sorry, but the equals the map uses is the one of the key, isn't it? So, I think it's not the equals of DateWrapper that would be used, but the one of Long. – cdarwin Nov 15 '10 at 19:29
0

The equals() will be called on the Long keys, not the values. You are fine.

Jim Garrison
  • 85,615
  • 20
  • 155
  • 190
  • Actually, it depends on what you are trying to accomplish. It sounds like the data structure you have chosen is NOT going to do what you want. Please update your question to indicate what you are trying to do. – Jim Garrison Nov 15 '10 at 19:36
  • Done (added first sentence). I'd like to have one DateWrapper at most existing at the same time for the same DateTime value – cdarwin Nov 15 '10 at 19:41
  • 1
    @cdarwin: You might as well make the `DateTime` itself the map key, though I think what you have will work (unless `get` may be called on multiple threads, of course). – ColinD Nov 15 '10 at 19:46
  • Once you have all these date wrappers in the hash, what are you going to do with them? Why are they in a hash to begin with? – Jim Garrison Nov 15 '10 at 19:47
  • @ColinD: then he'd have to implement equals() himself. It's not clear what he really wants to do or even if a HashMap is the right data structure. – Jim Garrison Nov 15 '10 at 19:48
  • @ColinD In terms of memory consumption, would it be the same? – cdarwin Nov 15 '10 at 19:50
  • @Jim Garrison: in my app if two class asks for the "date" 2010-01-01, the same object must be returned. So, when I persist the dates (DateWrappers) with Hibernate, I don't store duplicate dates. See also http://stackoverflow.com/questions/4186531/persisting-time-ojects-as-entities-instead-of-value-types – cdarwin Nov 15 '10 at 19:54
  • @Jim Garrison: I think it would be premature optimization if all the dates hadn't to be stored into a db (and reloaded too, obviously) – cdarwin Nov 15 '10 at 19:59
  • @Jim Garrison: I'm talking about the `DateTime` as a key, not his `DateTimeWrapper`. Anyway, it doesn't really matter much. – ColinD Nov 15 '10 at 20:01
  • @cdarwin: The memory consumption might be slightly better given that each `DateTime` instance already must be stored in memory in the `DateTimeWrapper`. The `Long` value of the milliseconds may not need to be kept in memory. – ColinD Nov 15 '10 at 20:03
0

If you are using the actual objects you want as map keys and letting the HashMap take care of the details of what it does with the hashcode of those objects (and the keys implement equals and hashCode according to their contract) there will be no issue if there's a hashcode collision other than some possible performance reduction due to the need to search linearly through every entry that hashed to the same bucket.

The issue in your other question where the subject of collisions came up was that rather than using the actual object that should have been the key, you were using the hashcode of that object as the key itself. This was incorrect and would have led to incorrect behavior.... when you went to look up the value for a given key in the map, the result could have been the value that actually maps to a completely different key that just happens to have the same hashcode.

The moral of the story is: use the actual key or something that is definitely equivalent (like the millis of the DateTime in this case) as the key, not the key's hashcode. The HashMap does what it needs with the hashcode for you.

ColinD
  • 108,630
  • 30
  • 201
  • 202
  • Ok, thank you Colin. I knew this, but the confusion came from using a key that wasn't unique for the objects in the code you are referring to (for other readers, see http://stackoverflow.com/questions/4179641/caching-objects-built-with-multiple-parameters). So, its rather simple: the key used in a hashmap must correspond to one and only one object, like the primary key in RDBMS – cdarwin Nov 15 '10 at 21:32
0

Given what you're ultimately trying to accomplish with this, that doesn't seem too terribly productive. You have a highly optimized data structure specifically designed for fast searching and enforcing uniqueness, called a Database Index. And extremely robust in-memory and L2 caching already in place for you from Hibernate. Which incidentally does not have the thread safety issues of putting a HashMap on a static field.

Why not make that number the value of the ID column in your database and let the robust platform technologies take care of finding it quickly and caching it for you? An in-memory L2 cache hit is really not that much slower than a HashMap in the grand scheme of things. It would be a pretty rare application where the difference is one of your meaningful hotspots.

Affe
  • 47,174
  • 11
  • 83
  • 83