7

I wish to iterate over a set but the contents of the set will modify during its iteration. I wish to iterate over the original set at the time the iterator was created and not iterate over any of the new elements added to the set. How is this possible? Is this is the default behavior of set or how can I accomplish this?

One way I can think of is to get a new set from the original set which won't be modified but this seems inelegant and there must be a better solution.

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
nomel7
  • 1,523
  • 3
  • 12
  • 20

6 Answers6

8

EDIT: This answer was designed for a single-threaded case, since I had interpreted the OP's question as avoiding comodification rather than avoiding issues from multithreading. I'm leaving this answer here in case it ends up being useful to anyone in the future who is using a single-threaded approach.

There is no direct way to accomplish this. However, one option that is quite nice is to have two sets - the main set, which you iterate over, and a secondary set into which you insert all the new elements that need to be added. You can then iterate over the primary set, and then once that's finished go and use addAll to add all the new elements to the primary set.

For example:

Set<T> masterSet = /* ... */

Set<T> newElems = /* ... */
for (T obj: masterSet) {
     /* ... do something to each object ... */
}

masterSet.addAll(newElems);

Hope this helps!

templatetypedef
  • 362,284
  • 104
  • 897
  • 1,065
  • 1
    I like this method because (a) it produces fewer temporary objects than copying the whole original set does and (b) it avoids a lot of concurrency overhead that you might not need. The `I can't see any guarantees around behaviour of an iterator in terms of seeing new elements` can be a problem for the `ConcurrentSkipListSet` – Stephen P Jun 19 '12 at 18:28
  • I'm not sure how this can work. How does the 2nd thread know that it has to add to newElems, not to masterSet??? And, if you are NOT iterating, who knows to then merge newElems into masterSet??? – user949300 Jun 19 '12 at 18:59
  • @user949300- I would assume whatever code there is that is adding elements to the set could know what the new elements set is. Also do note that the OP's question says nothing about multithreading; I think the issue was comodification rather than concurrency. It would be very easy to communicate this new information into other threads if they were to exist. – templatetypedef Jun 19 '12 at 19:02
  • @templatetypedef - I understood his question to mean multi-threading, but, on 2nd reading, he could mean comodification, in which case your solution is indeed ideal. I also see your comment to the OP asking for more info. – user949300 Jun 19 '12 at 19:12
  • Since OP has clarified that he is talking multi-threading, this answer is no longer suitable. (at least not without more work) – user949300 Jun 19 '12 at 23:12
  • @user949300- This strategy absolutely can be made to work. You would just need to do a little extra synchronization. Though I agree that there are much better approaches given that this is intended for multithreading. – templatetypedef Jun 19 '12 at 23:20
8

Taking a snapshot of the set sounds like exactly the right solution to me, if you want to make sure you don't see any new elements. There are some sets such as ConcurrentSkipListSet which will allow you to keep iterating, but I can't see any guarantees around behaviour of an iterator in terms of seeing new elements.

EDIT: CopyOnWriteArraySet has the requirements you need, but writes are expensive, which sounds like it's not appropriate for you.

Those are the only sets I can see in java.util.concurrent, which is the natural package for such collections. Taking a copy is still likely to be simpler :)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • It depends. If the snapshot _isn't_ required (nobody happens to insert while you are iterating) CopyOnWriteArraySet will be faster. So it depends on how often there are actual collisions. – user949300 Jun 19 '12 at 23:14
  • @user949300: I don't know the details of when CopyOnWriteArraySet really requires taking a copy, but the documentation claims it's "usually" (whatever that means). It would certainly be nice if it only copied when really, really necessary... – Jon Skeet Jun 20 '12 at 05:43
2

Making a copy of the Set is the elegant solution.

Set<Obj> copyOfObjs = new HashSet<Obj>(originalSet);
for(Obj original : originalSet) {
    //add some more stuff to copyOfObjs
}
nicholas.hauschild
  • 42,483
  • 9
  • 127
  • 120
0

You can use a ConcurrentHashMap with dummy keys. Or a ConcurrentSkipListSet

Suraj Chandran
  • 24,433
  • 12
  • 63
  • 94
0

Now that OP has clarified the requirements, the solutions are

  1. Copy the set before iterating
  2. Use CopyOnWriteArraySet
  3. Write your own custom code and try to be smarter than a lot of smart people.

The drawback of #1 is that you always copy the set even if it may not be needed (e.g. if no insertions actually occur while you are iterating) I'd suggest option #2, unless you prove that frequent inserts are causing a real performance issue.

user949300
  • 15,364
  • 7
  • 35
  • 66
0

As others have suggested here, there is no optimal solution to what you search for. It all depends on the use-case of your application, or the usage of the set
Since Set is an interface you might define your own DoubleSet class which will implement Set and let's say will use two HashSet fields.
When you retrieve an iterator, you should mark one of these sets to be in "interation only mode", so the add method will add only to the other set


I am still new to Stackoverlflow, so I need to understand how to embed code in my answers :( but in general you should have a class called MySet (Generic of generic type T) implementing Set of generic type T.
You need to implement all the methods, and have two fields - one is called iterationSet and the other is called insertionSet.
You will also have a boolean field indicating if to insert to the two sets or not. When iterator() method is called, this boolean should be set to false, meaning you should insert only to the insertionSet.
You should have a method that will synchronize the content of the two sets once you're done with the iterator.
I hope I was clear

Yair Zaslavsky
  • 4,091
  • 4
  • 20
  • 27