4

I was going through a book on Java8, where distinct was explained for stream. It is mentioned that the equality in order to produce distinct elements is determined by the implementation of hashCode() & equals() method. Therefore I wrote the below code to understand with example :

static class Order{
        public Order(int id,Double value){
            this.id = id;
            this.value = value;
        }
        int id;
        Double value;
        @Override
        public int hashCode() {
            System.out.println("In Hashcode() - " + this.id +","+this.value);
            return this.id;
        }
        @Override
        public boolean equals(Object o){
            System.out.println("In Equals()");
            return this.id == ((Order)o).id;
        }
    }

    public static void main(String[] args) {
        Stream<Order> orderList = Stream.of(new Order(1,10.0),new Order(2,140.5),new Order(2,100.8));
        Stream<Order> biggerOrders = orderList.filter(o->o.value > 75.0);
        biggerOrders.distinct().forEach(o->System.out.println("OrderId:"+ o.id));
    }

It produced the following output :

In Hashcode() - 2,140.5
In Hashcode() - 2,140.5
OrderId:2
In Hashcode() - 2,100.8
In Equals()

I am confused about why the hashCode method on the same Order object(2,140.5) is called twice before comparing it with another Order Object(2,100.8).

Thanks in advance.

Chota Bheem
  • 1,106
  • 1
  • 13
  • 31

2 Answers2

3

As answerd by @Adi, distinct() is using a HashMap internally which calls hashCode() of the Order.

Here's the relevant code where both of the calls are made

In the java.util.stream.DistinctOps.makeRef()

return new Sink.ChainedReference<T, T>(sink) {
    Set<T> seen;

    @Override
    public void begin(long size) {
        seen = new HashSet<>();
        downstream.begin(-1);
    }

    @Override
    public void end() {
        seen = null;
        downstream.end();
    }

    @Override
    public void accept(T t) {
        if (!seen.contains(t)) {//first call is made here
            seen.add(t);//second call is made here
            downstream.accept(t);
        }
    }
};

Following is the stacktrace for both the calls.

enter image description here enter image description here

11thdimension
  • 10,333
  • 4
  • 33
  • 71
  • 1
    A very common anti-pattern. `if(!seen.contains(t)) { seen.add(t); … }` could easily be replaced by `if(seen.add(t)) { … }` avoiding the double hashing, as that’s exactly the contract of `add`, adding the element only if it isn’t already contained in the `Set` (and returning whether it has been added). – Holger Jan 04 '16 at 10:15
  • And that's in Oracle JDK – 11thdimension Jan 04 '16 at 20:47
  • 2
    This has been fixed in the meanwhile. In recent versions, the JDK code looks like `if(seen.add(t)) { downstream.accept(t); }` – Holger May 25 '23 at 13:43
2

First time hashCode is called to check if the item (order) is already present in HashMap (distinct uses internal HashMap). Second time it is called to put the item (order) in the hashmap if not present.

Tip: Try debugging the hashCode method.

Adi
  • 727
  • 3
  • 10