4

I'm writing a program that makes extensive use of large HashMaps. It is multithreaded, so I've used read-write locks when accessing it. However, it has a special property that I'd like to exploit.

After data is "put" into the HashMap, that data is never changed. Ever. Whenever a change to the state of this data structure is made, it actually just creates a new "generation" of the structure, leaving the old one intact.

That is, is it safe to read a value from a HashMap at the same time that another thread is writing a value, knowing that the other thread will never be writing to the value you're reading? Is there some simple hashtable structure that will give me this sort of guarantee?

Ethan
  • 565
  • 1
  • 7
  • 17
  • Are you wanting to deep-copy the hashmap after modifying it in some way? These new copies are readonly and don't need MT sync as @Brian Roach points out. However you might need sync in order to *select* the proper "version" to read. Or, maybe I'm missing something... – seand Apr 19 '11 at 05:08

4 Answers4

2

Not really. Because you can write to it, you may trigger a resize of the underlying array when you do. If you trigger a resize in the middle of another thread's read, you're really going to mess with its ability to find the data accurately!

corsiKa
  • 81,495
  • 25
  • 153
  • 204
  • 1
    He already stated he's not writing to it. There's no reason to lock/synchronize for read only – Brian Roach Apr 19 '11 at 04:56
  • 2
    No, that's not what he said. He said he's not overwriting to it. Let's say he adds `EthanObject eo`. Then he makes a change to it. Instead of overwriting `eo` in the map, he'll instead make `EthanObject eo2` and add `eo2`. – corsiKa Apr 19 '11 at 04:57
  • 1
    No, I *am* writing to the HashMap, just not to the same value I'm reading. I see what glowcoder means though - could you recommend a data structure that gracefully resizes, in such a way that it doesn't disrupt the logic of a current read? – Ethan Apr 19 '11 at 04:58
  • @Ethan unfortunately, I can't. It would not be inconceivable (or difficult) to write your own though. The problem is it would involve a copy of the underlying array structure, making `insert()` an `O(n)` call. – corsiKa Apr 19 '11 at 05:00
  • Nm, you're right ... I think I got tripped up by his second paragraph. I was reading that he would create a new hashmap, not just a new object that was IN the hashmap – Brian Roach Apr 19 '11 at 05:01
  • Ah yes, long insertions are something I would like to avoid. The structure is to maintain the game state in an AI. It spawns multiple threads that all speculatively change things - the write-once stuff is there to help make it thread-safe (and to just make it easy to interface with). However, there will be quite a bit of speculative moves going on, so I was looking for something that had a constant time insert and get, with this property. Perhaps I'll look more into implementing my own. – Ethan Apr 19 '11 at 05:03
  • 2
    @Ethan - http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentHashMap.html – Brian Roach Apr 19 '11 at 05:10
  • @Ethan if you know the maximum size of your hashmap then you can initialize its size appropriately and avoid resizing. – richs Apr 19 '11 at 15:21
1

The problem isn't the data IN the hashmap, it's that you're modifying the hashmap itself when you insert something; its structure. You can't do that with multiple threads at once with a standard HashMap.

The java concurrent package does offer a thread-safe hashtmap:

http://download.oracle.com/javase/6/docs/api/java/util/concurrent/ConcurrentHashMap.html

Internally this is going to be using non-locking methods of thread safety.

Brian Roach
  • 76,169
  • 12
  • 136
  • 161
  • I thought a concurrent hash map used both locking and non-locking methods for thread safety. The locking methods don't lock the entire map but a portion of the map. – richs Apr 19 '11 at 13:52
  • @richs - I'd have to read the source, but most of the concurrent packages relies on CAS rather than locks and the javadocs say *even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access*. It's possible there's locking involved for two concurrent inserts, but overall I think the overhead should be less for him that using heavyweight locks for all access. – Brian Roach Apr 19 '11 at 13:58
1

I know you've stated that it won't be overwritten, but it's worth considering a ConcurrentHashMap, if only because you won't need your 'locking' code any more.

This special map (since java 1.5) guarantees that you'll never get a ConcurrentModificationException, because it'll return you the last 'complete' write.

http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html

It's also super-fast for multiple concurrent reads. See this article for a bit more info:

http://www.ibm.com/developerworks/java/library/j-jtp07233/index.html#N101CD

Other things to note: it doesnt allow null keys/values, and it has another handy method, putIfAbsent.

HTH

laher
  • 8,860
  • 3
  • 29
  • 39
0

Instead of the HashMap you could use a persistentMap, then each writer will have to lock it while adding the new object and replacing the reference to the map with the new one, but readers could always read from the "current" version (possibly not finding the value they are looking for, because it is concurrently added.

Note that reading and writing the reference to the Map has to be done atomically.

subsub
  • 1,857
  • 10
  • 21