1

I'm looking for a Java collection, possibly in standard library, which is able to collect the following structure:

class Item {
    String key;
    double score;
}

And with the following properties:

  • Only one Item with the same key is allowed (like a set)
  • insert, remove, check existance in max O(logn)
  • traversal ordered by score, finding next in max O(logn)

As far as I understood standard OrderedSet must have comparable interface coherent with equals() interface, but that is not my case as two items with different key may have the same score.

In fact I've notice that TreeSet uses the comparator returning 0 to check if the item is already present.

Any suggestion?

Jack
  • 1,488
  • 11
  • 21
  • 2
    If you want to maintain ordered traversal, then insert will be _O(log n)_, not _O(1)_. Your requirements are not feasible, unless you're ok with "find first" being _O(n log n)_. – Andreas Dec 25 '20 at 12:37
  • 3
    Since you won’t be able to find anything satisfying all your criteria, in which order are you able to give them up? – Konrad Rudolph Dec 25 '20 at 12:44
  • Thanks for suggestion I've edited requirements. Actually the only real requirements is that it must run fast with 100 milion entries – Jack Dec 25 '20 at 12:46
  • Which operation(s) must run fast? – Reto Höhener Dec 25 '20 at 12:48
  • Probabily insertion, but i was looking for something meeting al requirements. If not feasible I'd use an hashset and sort each time before traversing – Jack Dec 25 '20 at 12:50

5 Answers5

1

I don't think such a structure exists. You didn't specify traversal performance requirements, so you could use a normal Set and add the values to a list and sort that list by score for traversal.

Reto Höhener
  • 5,419
  • 4
  • 39
  • 79
1

A HashSet does not guarantee any order of its elements. If you need this guarantee, consider using a TreeSet to hold your elements but for achieving the unique by key and maintain constant time override hashCode() and equals() effectively what you looking for as the below :

class Item {
    String key;
    double score;

    public Item(String key, double score) {
        this.key = key;
        this.score = score;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Item item = (Item) o;
        return key.equals(item.key);
    }

    @Override
    public int hashCode() {
        return Objects.hash(key);
    }

    @Override
    public String toString() {
        return "Item{" +
                "key='" + key + '\'' +
                ", score=" + score +
                '}';
    }
}

// main
public static void main(String[] args) {

        Set<Item> itemSet = new HashSet<>();

        itemSet.add(new Item("1", 1));
        itemSet.add(new Item("1", 2));
        itemSet.add(new Item("2", 1));

        //to get a sorted TreeSet
        //Add all your objects to the TreeSet, you will get a sorted Set.
        //TreeSet myTreeSet = new TreeSet();
        //myTreeSet.addAll(itemSet);
        //System.out.println(myTreeSet);
}

output :

Item{key='1', score=1.0}
Item{key='2', score=1.0}
1

Thanks to people that made me think with their comments and answers. I believe we can achieve the requirements by using:

TreeMap<Double, HashSet<Item>>

Just because (I haven't said that) two equal keys yield the same score; but more in general it's enough to have two maps of set: one (ordered) with ordering field as key, and one (not ordered) with unique field as key.

Jack
  • 1,488
  • 11
  • 21
1

Now that insert has been relaxed to O(log n), you can do this with a double-set, i.e. implement your own set that maintains 2 sets behind the scenes.

Best would be if you can modify class Item to implement equals() and hashCode() to only use the key field. In that case your class would use a HashSet and a TreeSet. If hashCode() covers more than just the key field, then use two TreeSet objects.

final class ItemSet implements NavigableSet<Item> {

    private final Set<Item> keySet = new HashSet<>();
    //                           or: new TreeSet<>(Comparator.comparing(Item::getKey));
    private final TreeSet<Item> navSet = new TreeSet<>(Comparator.comparingDouble(Item::getScore)
                                                                 .thenComparing(Item::getKey));

    //
    // Methods delegating to keySet for unique key access and for unordered access
    //

    @Override public boolean contains(Object o) { return this.keySet.contains(o); }
    @Override public boolean containsAll(Collection<?> c) { return this.keySet.containsAll(c); }
    @Override public int size() { return this.keySet.size(); }
    @Override public boolean isEmpty() { return this.keySet.isEmpty(); }

    //
    // Methods delegating to navSet for ordered access
    //

    @Override public Comparator<? super Item> comparator() { return this.navSet.comparator(); }
    @Override public Object[] toArray() { return this.navSet.toArray(); }
    @Override public <T> T[] toArray(T[] a) { return this.navSet.toArray(a); }
    @Override public Item first() { return this.navSet.first(); }
    @Override public Item last() { return this.navSet.last(); }
    @Override public Item lower(Item e) { return this.navSet.lower(e); }
    @Override public Item floor(Item e) { return this.navSet.floor(e); }
    @Override public Item ceiling(Item e) { return this.navSet.ceiling(e); }
    @Override public Item higher(Item e) { return this.navSet.higher(e); }

    //
    // Methods delegating to both keySet and navSet for mutation of this set
    //

    private final class ItemSetIterator implements Iterator<Item> {
        private final Iterator<Item> iterator = ItemSet.this.navSet.iterator();
        private Item keyToRemove;
        @Override
        public boolean hasNext() {
            return iterator.hasNext();
        }
        @Override
        public Item next() {
            keyToRemove = iterator.next();
            return keyToRemove;
        }
        @Override
        public void remove() {
            iterator.remove();
            ItemSet.this.keySet.remove(keyToRemove);
            keyToRemove = null;
        }
    }

    @Override
    public Iterator<Item> iterator() {
        return new ItemSetIterator();
    }
    @Override
    public void clear() {
        this.keySet.clear();
        this.navSet.clear();
    }
    @Override
    public boolean add(Item e) {
        if (! this.keySet.add(e))
            return false; // item already in set
        if (! this.navSet.add(e))
            throw new IllegalStateException("Internal state is corrupt");
        return true;
    }
    @Override
    public boolean remove(Object o) {
        if (! this.keySet.remove(o))
            return false; // item not in set
        if (! this.navSet.remove(o))
            throw new IllegalStateException("Internal state is corrupt");
        return true;
    }
    @Override
    public boolean addAll(Collection<? extends Item> c) {
        boolean changed = false;
        for (Item item : c)
            if (add(item))
                changed = true;
        return changed;
    }
    @Override
    public boolean removeAll(Collection<?> c) {
        boolean changed = false;
        for (Object o : c)
            if (remove(o))
                changed = true;
        return changed;
    }
    @Override
    public boolean retainAll(Collection<?> c) {
        throw new UnsupportedOperationException("Not yet implemented");
    }
    @Override
    public Item pollFirst() {
        throw new UnsupportedOperationException("Not yet implemented");
    }
    @Override
    public Item pollLast() {
        throw new UnsupportedOperationException("Not yet implemented");
    }
    @Override
    public NavigableSet<Item> descendingSet() {
        throw new UnsupportedOperationException("Not yet implemented");
    }
    @Override
    public Iterator<Item> descendingIterator() {
        throw new UnsupportedOperationException("Not yet implemented");
    }
    @Override
    public SortedSet<Item> headSet(Item toElement) {
        throw new UnsupportedOperationException("Not yet implemented");
    }
    @Override
    public NavigableSet<Item> headSet(Item toElement, boolean inclusive) {
        throw new UnsupportedOperationException("Not yet implemented");
    }
    @Override
    public SortedSet<Item> tailSet(Item fromElement) {
        throw new UnsupportedOperationException("Not yet implemented");
    }
    @Override
    public NavigableSet<Item> tailSet(Item fromElement, boolean inclusive) {
        throw new UnsupportedOperationException("Not yet implemented");
    }
    @Override
    public SortedSet<Item> subSet(Item fromElement, Item toElement) {
        throw new UnsupportedOperationException("Not yet implemented");
    }
    @Override
    public NavigableSet<Item> subSet(Item fromElement, boolean fromInclusive, Item toElement, boolean toInclusive) {
        throw new UnsupportedOperationException("Not yet implemented");
    }

}
Andreas
  • 154,647
  • 11
  • 152
  • 247
  • I came up with a simper solution. It's enough to use one simple `TreeSet` with Item implementing Comparable in this way: `return this.key.equals(o.key) ? 0 : this.score == o.score ? 1 : Double.compare(this.score, o.score); ` What do you think? – Jack Dec 26 '20 at 20:33
  • @Jack I think (well, I know) that it violates the contract of `compareTo()`. – Andreas Dec 26 '20 at 21:18
  • tell me more please. If we define equals() and hashcode() in a way that they only take into account the key field (ignoring score), is such compareTo() violating its contract ? – Jack Dec 26 '20 at 22:37
  • Anyway I think you're right: the set randomly fails the contains() check for an Item with existing key; I believe is getting confused by the descending of the tree due to the score comparison – Jack Dec 26 '20 at 23:14
0

Only one Item with the same key is allowed (like a set)

Your Item class should implement hashCode() and equals() using just the key attribute.

insert, remove, check existance in constant time

TreeSet add() and remove() are O(ln N), so they do not meet your criteria.

HashSet add() and remove() usually are O(1).

traversal ordered by score

What are your performance requirements here? How frequently will you be traversing the collection? If you will be mainly adding and removing the items and rarely traversing it, then you can make a copy of a HashSet to a TreeSet during the traversing operation.

Adam Siemion
  • 15,569
  • 7
  • 58
  • 92
  • The use of this structure is equally divided in insertions, existance check, and traversal. – Jack Dec 25 '20 at 12:48