0

The standard says that (equal x y) implies (= (sxhash x) (sxhash y)). Let us check it:

(defun sxhash-test ()
  (let ((obj1 (list 1 2 (list 1 1)))
        (obj2 (list 1 2 (list 1 2))))
    (format t "are objects equal?: ~a~%" (equal obj1 obj2)) ;; => NIL
    (format t "are their hashes equal?: ~a~%"(= (sxhash obj1) (sxhash obj2))))) ;; => T

The function equal works as expected but sxhash doesn't. Could you please explain what I am doing wrong? I use SBCL 2.1.9.

Thank you.

IgSokolov
  • 55
  • 4
  • 3
    Basic logic: `x implies y` doesn't mean that `not x implies not y`. – Barmar Nov 12 '21 at 22:20
  • There are many more possible lists than hash values. So collisions are possible. – Barmar Nov 12 '21 at 22:21
  • Does `sxhash` traverse the tree to create the hash? It doesn't seem so. `(list 1 2 3)` and `(list 1 2 4)` create different hashes. – Manfred Nov 13 '21 at 11:26

2 Answers2

1

sxhash has to satisfy four requirements:

  1. objects which are equal (and hence objects which are eql, eq, but not necessarily equalp will have the same sxhash value;
  2. the sxhash value of an object must not change during the life of single image unless the object is changed in such a way as to not make it equal to a copy of it before the change;
  3. objects of various types which have a well-defined notion of similarity between images then their sxhash value must be the same in each image.
  4. computation of sxhash must always terminate.

(There is another vague requirement of 'being a good hash code').

(1) means that two objects which are not equal may have the same code but may not, but two objects which are equal must have the same value. A terrible but possible implementation of sxhash would be:

(defun sxhash/terrible (it)
  (declare (ignore it))
  0)

This fails the 'being a good hash code' test, but that's not something that can really be enforced.

What you are seeing is that two objects which are not equal do have the same sxhash value: that's fine.

Indeed, (1) together with (4) mean that if an implementation is going to compute sxhash on conses in such a way that it walks the graph, then it has to be pretty careful about that: it either needs an occurs check or it needs to only go so deep.

However it is quite possible that sxhash does descend into cons trees. As an example here is LispWorks doing just that:

> (sxhash '(1 2 3))
11890816076270616

> (sxhash '(1 2 3 (4)))
369102953153702944

> (sxhash '(1 2 3 (4)))
740958182301008344

> (sxhash '(1 2 3 (5)))
740958455027237144

> (sxhash '(1 2 3 (5 6)))
741326672350173760

> (sxhash '(1 2 3 (5 (6))))
925006242171775434

Equally it is quite plausible that sxhash treats all instances of a given structure class (or of a given instance of standard-class) as having the same value, because the address of such an object is not constant and there's no obvious place to store the hash code without burning memory. But that's in no way a requirement.

ignis volens
  • 7,040
  • 2
  • 12
0

The reason why this effect is observed is that two things are required for the values to be equal:

  1. The two values are the same, meaning that have the same hash.
  2. The two values have the same address

The two lists tested have the same hash, because sxhash doesn't follow nesting. In fact, two structures will always have the same hash.

 (sxhash (list 1 2 3)); => 3971322300187561939
 (sxhash (list 1 2 3)); => 3971322300187561939 (so, repeatable)
 (sxhash (list 1 2 3 (list 4))) ; => 3180777146619076709
 (sxhash (list 1 2 3 (list 5))) ; => 3180777146619076709 (ok ...)

Why does `sxhash` return a constant for all structs?

As for equal addresses, if I create two values like 'a they in fact turn out to be one item with one address, and is only stored on the first time that it is seen. Whereas (list 1 2 (list 1 1)) and (list 1 2 (list 1 2)) are different things and are stored at separate addresses.

(sb-kernel:get-lisp-obj-address 'a) ; => 68772678703
(sb-kernel:get-lisp-obj-address 'a) ; => 68772678703 (same...)
(sb-kernel:get-lisp-obj-address (list 1 2 3 (list 4))) ; => 68772925863
(sb-kernel:get-lisp-obj-address (list 1 2 3 (list 5))) ; => 68772805335

Testing these two lists for equality pass using sxhash, but fail with different addresses.

Francis King
  • 1,652
  • 1
  • 7
  • 14