19

In Java, a set only checks for equality of an object with objects already in the set only at insertion time. That means that if after an object is already present in the set, it becomes equal to another object in the set, the set will keep both equal objects without complaining.

EDIT: For example, consider a simple object, and assume that hashCode and equals are defined following best practices/

class Foo {
    int foo;

    Foo(int a){ foo = a; }
    //+ equals and hashcode based on "foo"
}

Foo foo1 = new Foo(1);
Foo foo2 = new Foo(2);
Set<Foo> set = new HashSet<Foo>();
set.add(foo1);
set.add(foo2);
//Here the set has two unequal elements.
foo2.foo = 1;
//At this point, foo2 is equal to foo1, but is still in the set 
//together with foo1.

How could a set class could be designed for mutable objects? The behavior I would expected would be the following: If at any time one of the objects in the set becomes equal to another object in the set, that object is deleted from the set by the set. Is there one already? Is there a programming language that would make this easier to accomplish?

Jadiel de Armas
  • 8,405
  • 7
  • 46
  • 62
  • What behaviour would you expect? – assylias Feb 20 '15 at 13:21
  • Let me clarify it in the question. Thanks. – Jadiel de Armas Feb 20 '15 at 13:22
  • Can you explain this part `That means that if after an object is already present in the set, it becomes equal to another object in the set, the set will keep both equal objects without complaining.` ? I'm not able to understand your intent :) – Arkantos Feb 20 '15 at 13:22
  • 2
    What you ask for would require the set to be aware of the changes in the contained objects - so you would need to write something that tells the set that something changed when an object is mutated. But there is a high chance that you could find an alternative design for your use case that has a more elegant solution. – assylias Feb 20 '15 at 13:25
  • I guess "If at any time one of the objects in the set becomes equal to another object in the set" can not happen in a set in first place. – Kartic Feb 20 '15 at 13:27
  • @Kartic of course it can. – assylias Feb 20 '15 at 13:28
  • Got it now, but that's the contract of hashcode and equals. No matter how many times you call them on the same object, hash code and equals should always return same value. If it's changing just by mutating the state of the object, then I suggest you use something that doesn't change :) – Arkantos Feb 20 '15 at 13:31
  • @JadieldeArmas Do the objects "know" about the `Set` that they are in? – Fildor Feb 20 '15 at 13:34
  • If hashCode and equals are implemented, can we add equal objects in set? If yes, I will delete my comment and study set once again. – Kartic Feb 20 '15 at 13:35
  • @Kartic we are not talking about adding equal objects. We are talking objects already in the set becoming equal by mutation. The Set won't be aware of this. – Fildor Feb 20 '15 at 13:36
  • @Fildor Yes, looking at the example, I got it :). Thanks! – Kartic Feb 20 '15 at 13:38
  • @Kartic.. what exactly are you trying to achieve by adding equal objects ? The very purpose of maintaining a Set is to have unique objects with out any duplicates and hascode & equals are the methods that defines the criteria for that uniqueness – Arkantos Feb 20 '15 at 13:39
  • Yeah, sorry! I didn't get the question initially. My mistake. – Kartic Feb 20 '15 at 13:41
  • np.. just trying to understand your intent :) – Arkantos Feb 20 '15 at 13:43
  • I suggest you switch to a Map, where you will select a key, a immutable object, and the the value, an object like Age, where you can modify whatever you want. – Radu Toader Feb 20 '15 at 13:44
  • I doubt that using a map will help. Let's say you have a map with `[1, age1]` and `[2,age2]`, later if you change `age2.age = 1`, now you have two different objects with same value. But the OP is asking if there's a Set that can act on the changed state of its contents and then remove them if it violates uniqueness – Arkantos Feb 20 '15 at 13:51
  • @Fildor: I have thought of that possibility. It would be a little intrusive, but maybe there could be a contract that the object informs the collection when it changes. – Jadiel de Armas Feb 20 '15 at 14:09
  • 1
    @JadieldeArmas Or you could use the reference to have the object remove and add itself again. Would need the fields to be changed through setters, though. Thread-Safety would be another issue to think about. I'm thinking of a setter doing : 1. Lock the set, 2. remove self, 3. change field, 4. (try to) add self, 5. unlock set – Fildor Feb 20 '15 at 14:11
  • @JadieldeArmas is it a requirement to have the duplicate removed or just that it isn't visible in an iterator? That would be easier to achieve. – mikea Feb 27 '15 at 21:06
  • Hi, I understand you'd like to have an answer that combines the top 2. However, I don't think they can be combined. I made an example for Joe's approach as well. Neither examples are thread safe. If you want them thread safe, it would be useful to limit the methods used to the minimum possible and make sure those are thread safe. Ideally, you'd know the usage as well to optimize your thread-safety strategy. – HSquirrel Mar 05 '15 at 16:21
  • @JadieldeArmas Great question. See my solution below for a different approach than that given so far. – dan b Mar 05 '15 at 23:44

10 Answers10

10

I don't think this can be reliably done in Java in the general sense. There is no general mechanism for ensuring a certain action on mutation of an object.

There a few approaches for solutions that may be sufficient for your use case.

1. Observe elements for changes

  • You need to have control over the implementation of the types that go into the set
  • Small performance cost whenever an object in your set updates

You could try to enforce an observer like construction where your Set class is registered as an Observer to all its items. This implies you'd need to control the types of objects that can be put into the Set (only Observable objects). Furthermore, you'd need to ensure that the Observables notify the observer for every change that can affect hashcode and equals. I don't know of any class like this that exists already. Like Ray mentions below, you'll need to watch out for potential concurrency problems as well. Example:

package collectiontests.observer;

import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Observable;
import java.util.Observer;
import java.util.Set;

public class ChangeDetectingSet<E extends Observable> implements Set<E>, Observer {

    private HashSet<E> innerSet;

    public void update(Observable o, Object arg) {
        innerSet.remove(o);
        innerSet.add((E)o); 
    }
    public int size() {
        return innerSet.size();
    }
    public boolean isEmpty() {
        return innerSet.isEmpty();
    }
    public boolean contains(Object o) {
        return innerSet.contains(o);
    }
    public Iterator<E> iterator() {
        return innerSet.iterator();
    }
    public Object[] toArray() {
        return innerSet.toArray();
    }
    public <T> T[] toArray(T[] a) {
        return innerSet.toArray(a);
    }
    public boolean add(E e) {
        e.addObserver(this);
        return innerSet.add(e);
    }
    public boolean remove(Object o) {
        if(o instanceof Observable){
            ((Observable) o).deleteObserver(this);
        }
        return innerSet.remove(o);
    }
    public boolean containsAll(Collection<?> c) {
        return innerSet.containsAll(c);
    }
    public boolean addAll(Collection<? extends E> c) {
        boolean result = false;
        for(E el: c){
            result = result || add(el);
        }
        return result;
    }
    public boolean retainAll(Collection<?> c) {
        Iterator<E> it = innerSet.iterator();
        E el;
        Collection<E> elementsToRemove = new ArrayList<E>();
        while(it.hasNext()){
            el = it.next();
            if(!c.contains(el)){
                elementsToRemove.add(el); //No changing the set while the iterator is going. Iterator.remove may not do what we want.
            }
        }
        for(E e: elementsToRemove){
            remove(e);
        }
        return !elementsToRemove.isEmpty(); //If it's empty there is no change and we should return false
    }
    public boolean removeAll(Collection<?> c) {
        boolean result = false;
        for(Object e: c){
            result = result || remove(e);
        }
        return result;
    }
    public void clear() {
        Iterator<E> it = innerSet.iterator();
        E el;
        while(it.hasNext()){
            el = it.next();
            el.deleteObserver(this);
        }
        innerSet.clear();
    }
}

This incurs a performance hit every time the mutable objects change.

2. Check for changes when Set is used

  • Works with any existing object you want to put into your set
  • Need to scan the entire set every time you require info about the set (performance cost may get significant if your set gets very large).

If the objects in your set change often, but the set itself is used rarely, you could try Joe's solution below. He suggests to check whether the Set is still correct whenever you call a method on it. As a bonus, his method will work on any set of objects (no having to limit it to observables). Performance-wise his method would be problematic for large sets or often used sets (as the entire set needs to be checked at every method call).

Possible implementation of Joe's method:

package collectiontests.check;

import java.util.ArrayList;
import java.util.Collection;
import java.util.HashSet;
import java.util.Iterator;
import java.util.List;
import java.util.Set;

public class ListBasedSet<E> {

    private List<E> innerList;

    public ListBasedSet(){
        this(null);
    }

    public ListBasedSet(Collection<E> elements){
        if (elements != null){
            innerList = new ArrayList<E>(elements);
        } else {
            innerList = new ArrayList<E>();
        }
    }

    public void add(E e){
        innerList.add(e);
    }

    public int size(){
        return toSet().size();
    }

    public Iterator<E> iterator(){
        return toSet().iterator();
    }

    public void remove(E e){
        while(innerList.remove(e)); //Keep removing until they are all gone (so set behavior is kept)
    }

    public boolean contains(E e){
        //I think you could just do innerList.contains here as it shouldn't care about duplicates
        return innerList.contains(e);
    }

    private Set<E> toSet(){
        return new HashSet<E>(innerList);
    }
}

And another implementation of the check always method (this one based on an existing set). This is the way to go if you want to reuse the existing sets as much as possible.

package collectiontests.check;

import java.util.Collection;
import java.util.Comparator;
import java.util.Iterator;
import java.util.NavigableSet;
import java.util.SortedSet;
import java.util.TreeSet;

public class ChangeDetectingSet<E> extends TreeSet<E> {

    private boolean compacting = false;

    @SuppressWarnings("unchecked")
    private void compact(){
        //To avoid infinite loops, make sure we are not already compacting (compact also gets called in the methods used here)
        if(!compacting){ //Warning: this is not thread-safe
            compacting = true;
            Object[] elements = toArray();
            clear();
            for(Object element: elements){
                add((E)element); //Yes unsafe cast, but we're rather sure
            }
            compacting = false;
        }
    }
    @Override
    public boolean add(E e) {
        compact();
        return super.add(e);
    }
    @Override
    public Iterator<E> iterator() {
        compact();
        return super.iterator();
    }
    @Override
    public Iterator<E> descendingIterator() {
        compact();
        return super.descendingIterator();
    }
    @Override
    public NavigableSet<E> descendingSet() {
        compact();
        return super.descendingSet();
    }
    @Override
    public int size() {
        compact();
        return super.size();
    }
    @Override
    public boolean isEmpty() {
        compact();
        return super.isEmpty();
    }
    @Override
    public boolean contains(Object o) {
        compact();
        return super.contains(o);
    }
    @Override
    public boolean remove(Object o) {
        compact();
        return super.remove(o);
    }
    @Override
    public void clear() {
        compact();
        super.clear();
    }
    @Override
    public boolean addAll(Collection<? extends E> c) {
        compact();
        return super.addAll(c);
    }
    @Override
    public NavigableSet<E> subSet(E fromElement, boolean fromInclusive, E toElement, boolean toInclusive) {
        compact();
        return super.subSet(fromElement, fromInclusive, toElement, toInclusive);
    }
    @Override
    public NavigableSet<E> headSet(E toElement, boolean inclusive) {
        compact();
        return super.headSet(toElement, inclusive);
    }
    @Override
    public NavigableSet<E> tailSet(E fromElement, boolean inclusive) {
        compact();
        return super.tailSet(fromElement, inclusive);
    }
    @Override
    public SortedSet<E> subSet(E fromElement, E toElement) {
        compact();
        return super.subSet(fromElement, toElement);
    }
    @Override
    public SortedSet<E> headSet(E toElement) {
        compact();
        return super.headSet(toElement);
    }
    @Override
    public SortedSet<E> tailSet(E fromElement) {
        compact();
        return super.tailSet(fromElement);
    }
    @Override
    public Comparator<? super E> comparator() {
        compact();
        return super.comparator();
    }
    @Override
    public E first() {
        compact();
        return super.first();
    }
    @Override
    public E last() {
        compact();
        return super.last();
    }
    @Override
    public E lower(E e) {
        compact();
        return super.lower(e);
    }
    @Override
    public E floor(E e) {
        compact();
        return super.floor(e);
    }
    @Override
    public E ceiling(E e) {
        compact();
        return super.ceiling(e);
    }
    @Override
    public E higher(E e) {
        compact();
        return super.higher(e);
    }
    @Override
    public E pollFirst() {
        compact();
        return super.pollFirst();
    }
    @Override
    public E pollLast() {
        compact();
        return super.pollLast();
    }
    @Override
    public boolean removeAll(Collection<?> c) {
        compact();
        return super.removeAll(c);
    }
    @Override
    public Object[] toArray() {
        compact();
        return super.toArray();
    }
    @Override
    public <T> T[] toArray(T[] a) {
        compact();
        return super.toArray(a);
    }
    @Override
    public boolean containsAll(Collection<?> c) {
        compact();
        return super.containsAll(c);
    }
    @Override
    public boolean retainAll(Collection<?> c) {
        compact();
        return super.retainAll(c);
    }
    @Override
    public String toString() {
        compact();
        return super.toString();
    }
}

3. Use Scala sets

You could cheat and do away with mutable objects (in the sense that instead of mutating, you'd create a new one with one property changed) in your set. You can look at the set in Scala (I thought it was possible to call Scala from Java, but I'm not 100% sure): http://www.scala-lang.org/api/current/scala/collection/immutable/IndexedSeq.html

HSquirrel
  • 839
  • 4
  • 16
3

You will not find a general datastructure that can take just any object for this purpose. That kind of set would have to constantly monitor its elements, which among other things would lead to a lot of questions on concurrency.

However, I can imagine something based on the practically unknown class java.util.Observable. You could e.g. write a class ChangeAwareSet implements Set<? extends Observable>, Observer. When an element is added to this Set, it would register as an Observer and so be notified on all changes to that object. (But don't expect this to be very efficient, and you might encounter concurrency problems in this scenario as well.)

Ray
  • 3,084
  • 2
  • 19
  • 27
3

You can get the behaviour you're after by using another collection, such as an ArrayList. The contains and remove methods for a List make no assumptions about objects remaining unchanged.

Since changes can happen at any time, there isn't much room for optimisation. Any operations would need to perform a full scan over all contents, as any object could have changed since the last operation.

You may or may not wish to override add to check whether the object currently appears to be present. Then, when using or printing, use new HashSet(list) to eliminate objects which are currently duplicate.

Joe
  • 29,416
  • 12
  • 68
  • 88
  • +1, if we can combine your `new HashSet(oldSet)` with observable pattern where in every time a setter is called and if there's a valid state change, then we can achieve what the OP is looking for I guess – Arkantos Feb 20 '15 at 14:29
  • 1
    I like how this puts no constraints on what objects you put into the set. Do you see any way to limit the methods that need to do the full check? I've thought about limiting it only to methods that request information (hence not at insert time), but I feel it might pose a problem: users might expect an error at insert time if the object is already in the set. – HSquirrel Feb 20 '15 at 14:41
  • Ofcourse any approach is effected by the frequency of those mutations – Arkantos Feb 20 '15 at 14:42
3

Your problem is the identity of an object vs state. identity is not mutable over time, state is. In your set, you should preferably rely on identity because this is the only warranty for not introducing duplication by mutation, or you must rebuild the Set each time there is an element mutation. Technically, equals() and hashCode() should be constant over time to reflect identity.

As @assylias commented, there is certainly an alternative if you need to have a collection with combined identity and state.

  • have a Map<TheObject, List<State>> rather than a Set<TheObjectWithState>
  • remove the object from the Set before mutation, then check if it exists after mutation, add it if there is no duplicate.
T.Gounelle
  • 5,953
  • 1
  • 22
  • 32
3

You have two broad strategies here, I expect both wont give great performance (but that might not be a problem for your use).

  1. Have your set register with the objects for changes.
  2. Instead of modifying the set constantly, only update it when it is used.

Note that these solutions will have a slight difference in behavior.

Register for changes

This involves adding an Observable pattern (or alternatively a listener) to all objects stored in the set.

When an object is in the Set, the Set will register for changes. When an object changes it will signal the Set it has changed and the Set will change accordingly.

The most naive implementation is to just remove all equals objects and then re-add the object at any change. The naive implementation is always a good start so you can write a proper test set, and from there on you can improve the performance step by step.

Thread safety

Be careful when using this Set or the objects in it from multiple threads. Which a solution like this there are a lot of risks for deadlocks, so you would probably end up with a single ReadWriteLock for both the Set and the objects stored in it.

Update it when it is used

An alternative is a lazy strategy: only update the set when it is used. This is very useful when there are many changes to the objects but the set is not used as often.

It uses the following set idea (this makes me think of Schrodinger's cat):

If nobody is looking at the Set, does it matter what is in it?

An object is only defined by it's behavior on it's interface(s). So instead you can evaluate your set (and update it accordingly) at the point when the information is used.

General remarks

Here follow some remarks that apply to both choices.

Desired behavior

Watch out that you might run in to really weird behavior with a set like this. When you remove an object from the Set because it has become equal to another object, the outside world will not know you have removed that object.

See for instance the following, suing your Foo class:

Foo foo1 = new Foo(1);
Foo foo2 = new Foo(2);
Set<Foo> set = new MySet<Foo>();
set.add(foo1);
set.add(foo2);

foo2.foo = 1; // foo or foo2 is removed from the set.
foo2.foo = 3; // now the set contains with a foo or with 1 or with 3.

As an alternative you could take the objects stored in a list and convert them to set at the time you use.

Thirler
  • 20,239
  • 14
  • 63
  • 92
  • I think the observable pattern is the only feasible efficient solution. All other solutions seem to require that the entire set be traversed prior to any method call. I like the lazy approach, but only if combined with the observable pattern. In other words, when an object changes, the set keeps track of those elements that changed and prior to any operation it "resets" those elements. – dan b Mar 05 '15 at 20:55
  • @danb The OP did not mention performance as a requirement, so we can't tell. When creating such a class for a general purpose library you will need to care about performance, but very often the amount of elements in a set/list is small. Also a possibility is that the number of changes is huge compared to the number of reads. – Thirler Mar 05 '15 at 21:26
  • For small sets, it's not a difficult problem, simply use an arraylist. For the contains method traverse the list, removing duplicates as you go along. Similarly for other methods. It's really for larger sets that this gets interesting. I think the observer pattern you describe works well but the user of the set implementation must do things correctly. See my solution below for a method that is "safer" from the perspective of the user of the set. – dan b Mar 05 '15 at 23:50
1

This is a great question! Perhaps it is the source of many bugs! This is not just an issue with duplicates. Almost all the methods will return incorrect answers even without duplicates. Consider a hash set. If the hash changes even without creating a duplicate, the contains method will now return incorrect results since the object is in the wrong hash bucket. Similarly remove will not work correctly. For sorted sets, the iterator order will be incorrect.

I like the Observable pattern mentioned by @Thirler. Other solutions seem inefficient. In the observable approach mentioned there, there is a dependency that that implementer of the elements to be added to the set correctly notifies the set whenever and update occurs. The approach I mention here is somewhat more restrictive, but passes responsibility for the correct implementation to the set creator. So as long as the set is implemented correctly it will work for all users of the set. (See below for more on why the observer pattern is hard to implemented)

Here is the basic idea: Suppose that you want to create a set of foo object. We'll create a class called SetFoo. All aspects of foo objects are maintained by the set itself, including construction, and any changes to it. There is no way for any other user to create a Foo object directly because it is an inner class of SetFoo and the constructor is either private or protected. For example lets suppose we implement a class SetFoo where Foo has methods void setX(int x) and Foo int getX(). The class SetFoo would have methods like:

Foo instance(int x)  //Returns the instance of foo if it exists, otherwise creates a new one and returns it.

Let's say that internally SetFoo maintains a hashset of Foo objects.

Now the setX method of Foo would be defined to remove and re-add the element to the hashset if the value of x changes.

We can extend the idea of SetFoo to contain any number of elements, all of which are maintained by the set. This is really easy to implement for any kind of objects, however, it does require that the elements are all maintained by the set (including construction and all setter methods). Of course to make it multi-thread safe would take more work.

From the point of view of any user of the SetFoo class things would be simple:

 Foo f = setFoo.instance(1);
 ....
 f.setX(2);
 ...
 f.setX(3)

 f = setFoo.instance(1);  // Would internally create a new one since it was changed.
 f= setFoo.instance(3)   // Already in the set so no new one is created.

Now we can also add other methods to SetFoo, like

boolean contains (int x);
Iterator<Integer> iterator();
boolean remove(int x);
etc...

or we can add various methods to Foo:

remove()  // removes foo from the set.
exists()  // if foo still in the set?
add() // add foo back to the set

In the case where the elements can contain many fields we can have a FooSpec class. Suppose Foo contains an int x and int y. Then FooSpec would have getX, SetX, getY, setY methods and could be constructed using new FooSpec. Now setFoo would have methods like:

 Foo instance(FooSpec fooSpec)
 Collection<Foo> instanceAll(Collection<FooSpec> col)
 ...etc

So now you might be wondering why the observer pattern approach is subject to potential errors. With that approach the user of the set must correctly notify the set when it changes. That is effectively the same level of difficulty as as implementing a deeply immutable object (which may not be that easy). For example if the elements of the set are themselves collections or collections of collections, then you would need to make sure that you notify the set whenever anything in the collection (deeply) changes.

Leaving the responsibility to "deeply" notify the set, to the user of the set, would place a lot of burden on the developer. Better to implement a framework that would provide for objects that "deeply" notify.

dan b
  • 1,172
  • 8
  • 20
  • I'm confused to how this differs fundamentally from using an observer. In both cases you need to control the the objects that go into the set. Whenever one of the objects in the set changes, you do an action on your set (remove/add). In the observer approach, you call update; in your approach the object calls remove/add itself. There is a minor difference on how the set interacts with the objects: in the observer approach you need to register/delete as an observer at add/remove; in your approach the equivalent is baked into your inner class and your instance method. – HSquirrel Mar 06 '15 at 10:56
  • The difference is in shifting responsibility to the set implementer rather then the developer using the set, resulting in a safer implementation. In the approach I describe here, the developer cannot mess up, but constructing there own objects or by forgetting to notify the set when it changes internally. Notifying the set in all cases may not be that easy. For example suppose your set is a set of collections of foo. Then the developer must remember to notify the set when the collection or anything it it changes. – dan b Mar 06 '15 at 14:34
  • @HSquirrel I added some more comments in the solution above about the pitfalls of the observer approach – dan b Mar 06 '15 at 14:48
1

I am still not sure you understand the implications. If you have 2 objects which CAN equal each other at any point in time, may not equal each other at another point in time, therefore by default they are deemed as separate objects even though at the moment they may appear to be identical.

I would go about it at a different angle, and check if the set contains what the object will become when you perform the change, if you do not want it to exist in that set when it does equal another object.

1

Use safe publishing: Don't allow access to the Set or its elements; publish a deep copy instead.

You need a way of making a copy of a Foo; I'll assume a copy constructor.

private Set<Foo> set;

public Set<Foo> getFoos() {
    // using java 8
    return set.stream().map(Foo::new).collect(Collectors.toSet());
}

You should also save a copy of a Foo, rather than saving a foo, because the caller will have a reference to the added Foos, so the client can mutate them. Add an accessor method for this:

public boolean addFoo(Foo foo) {
    return set.add(new Foo(foo));
}
Bohemian
  • 412,405
  • 93
  • 575
  • 722
0

Set does makes use of hashCode and equals method. But when you say

it becomes equal to another object in the set, the set will keep both equal objects without complaining.

It's not the case. If you run add method by adding already existing element, it will return you false saying hey you already have an object in set.

Set is a mathematical term which does not allow duplicates and is the case with Java Set. Set is agnostic about whether the object that you are inserting into it is mutable or immutable. It's just like a collection which hold values.

Edit: As per the code, checks in set would be done when you insert the element to the Set and then if it changes, it wont care about it.

SMA
  • 36,381
  • 8
  • 49
  • 73
  • I will post an example so that you see what I mean. – Jadiel de Armas Feb 20 '15 at 13:23
  • @SMA, As I understand, there is no additional `add`, but a mutation of an object already in the `Set`, which then can be `equals` to another one. – T.Gounelle Feb 20 '15 at 13:25
  • @SMA I think he is talking about after the add. The Object is already in set. But it is mutable. So it is changed and after the change is identical to another object in set. If you removed and inserted, it would give you dupe-error, but not by mutation of the object in the set. – Fildor Feb 20 '15 at 13:26
0

Here are a few aspects for one approach I see

Use a 'Dynamic Element Set'

It might be good to have a clear distinction between having a mutable set class for immutable elements, as well as another set class for mutable elements

The set class for mutable elements would be 'dynamic element sets', and require each element to have a pointer to the containing set

Element itself registers change on modification

You could have to have a corresponding wrapper class for the elements contained in the set, so that it can register with the containing element

Hash table for fast single-threaded uniqueness checks

When adding an element to the set, the set will compute a hash of the element, and add that to a table (I'm sure thats how sets work anyhow)

Use this to check uniqueness and do elimination in O(1) time

Dirty / clean state for multithreaded cases

When you update an element, mark the containing set as 'dirty'

When the containing set is dirty, you can at some point rerun the uniqueness test to see if all elements are unique.

While that is happening, it probably should block any modifications to the elements until it has completed

With this, you probably deviate from exact uniqueness property.

Consider this: You have 3 elements in the list: A, B, and C, each with unique values

You change element B to same value as A Mark as dirty

Change element A to a different, unique value Still marked as dirty

Run the uniqueness check

So if you don't need absolute set property, but only an approximate, this might work

Otherwise if you need absolute set property, in a multithreaded case might not work

Updates seem to be pretty cheap, so you might be able to get away with it

Is this really a 'set'?

So, this kinda assumes that the elements are only modified from the provided interface for the set

When you wrap the base class of the element into the set, it should probably make a deep copy of the element to help prevent an element getting modifications from a non-registering reference object

So its not just a 'set', but rather imposes a requirement on the type of element being passed

It adds an interface layer to the element class

As such, the elements themselves are part of a new object in a sense I guess

Other thoughts

So of course, if one time an element can become the same as another element, then in the future it could also change to being different again

You are implying that a solution being searched for would be needed in a specific problem where that kind property is needed: Elements that are transiently duplicate need to be eliminated

wamster
  • 171
  • 12