7

From the question "Is partitioning easier than sorting?":

Suppose I have a list of items and an equivalence relation on them, and comparing two items takes constant time. I want to return a partition of the items, e.g. a list of linked lists, each containing all equivalent items.

One way of doing this is to extend the equivalence to an ordering on the items and order them (with a sorting algorithm); then all equivalent items will be adjacent.

(Keep in mind the distinction between equality and equivalence.)

Clearly the equivalence relation must be considered when designing the ordering algorithm. For example, if the equivalence relation is "people born in the same year are equivalent", then sorting based on the person's name is not appropriate.

  1. Can you suggest a datatype and equivalence relation such that it is not possible to create an ordering?

  2. How about a datatype and equivalence relation where it is possible to create such an ordering, but it is not possible to define a hash function on the datatype that will map equivalent items to the same hash value.

(Note: it is OK if nonequivalent items map to the same hash value (collide) -- I'm not asking to solve the collision problem -- but on the other hand, hashFunc(item) { return 1; } is cheating.)

My suspicion is that for any datatype/equivalence pair where it is possible to define an ordering, it will also be possible to define a suitable hash function, and they will have similar algorithmic complexity. A counterexample to that conjecture would be enlightening!

Community
  • 1
  • 1
Dan
  • 5,929
  • 6
  • 42
  • 52
  • 2
    It's a finite list, so obviously if you are allowed enough time you can always devise a perfect hash function: simply hash element a[i] to the smallest integer j in {1,2,…,n} such that a[i] is equivalent to a[j]. This takes O(n^2) in the worst case, and is also optimal if all you're given is an oracle for equivalence. If you're also already given an ordering, you need n log n to sort, but creating an ordering from just equivalence information would take Ω(n^2) too. – ShreevatsaR Jul 16 '10 at 05:53
  • 1
    I don't like your distinction between "equality" and "equivalence". Whatever you mean by "equality", that also defines an equivalence relation itself. – Dimitris Andreou Jul 16 '10 at 18:32
  • Not directly related to your question, but food for thought nevertheless. Datatype: graph, equivalence relation: isomorphism. If you could define a _small_ equivalence relation (either through hashing or through sorting) that is also a superset of the isomorphism equivalence relation, you would immediately have a good step for a graph isomorphism algorithm. All graph isomorphism algorithms I've seen remotely resemble hashing graphs, again and again (trimming more and more the tentative equivalence relation, till whatever remains is handled by brute force). – Dimitris Andreou Jul 16 '10 at 18:43
  • @Dimitris: Re your problem with "equality" and "equivalence": It's necessary to distinguish them because "equality" is often loaded with pre-specified meanings (e.g. a programming language will define when 2 integers are "equal") and we want the ability to specify an arbitrary equivalence relation ourselves (e.g. we might be interested in distinguishing only odd and even integers). – j_random_hacker Jul 17 '10 at 13:35

8 Answers8

5

The answer to questions 1 and 2 is no, in the following sense: given a computable equivalence relation ≡ on strings {0, 1}*, there exists a computable function f such that x ≡ y if and only if f(x) = f(y), which leads to an order/hash function. One definition of f(x) is simple, and very slow to compute: enumerate {0, 1}* in lexicographic order (ε, 0, 1, 00, 01, 10, 11, 000, …) and return the first string equivalent to x. We are guaranteed to terminate when we reach x, so this algorithm always halts.

user382751
  • 1,213
  • 8
  • 10
  • A good answer for those questions, so +1. But I want to say that this is not "the final answer" if you are interested in algorithmic complexity, as the OP is. Computability says nothing about how long the computation takes. – j_random_hacker Jul 16 '10 at 06:12
  • I agree but doubt that a "final answer" is forthcoming – none of the techniques in an introductory complexity theory course seem to have any promise, and complexity theorists generally don't have a lot of answers to this kind of question. I'm by no means a specialist, though, and it may be worth framing this question carefully and reposting it on MathOverflow. – user382751 Jul 16 '10 at 06:42
  • If we're going to speculate, then Graph Canonization _might_ be harder than Graph Isomorphism: http://en.wikipedia.org/wiki/Graph_canonization – user382751 Jul 16 '10 at 06:48
  • Not sure what you mean by your 1st comment... E.g. with a few weak assumptions we can get that partitioning is faster than sorting if we have a bound on the number of equivalence classes and that bound is less than log2(n). Also ShreevatsaR gives a O(n^2) algorithm for building a perfect hash function that is relevant I think. – j_random_hacker Jul 16 '10 at 08:52
  • o(log(n)) partitions is not exactly what I'd call a weak assumption. ShreevatsaR's algorithm is about as interesting as this one, but it's optimal (again) under the strong assumption that the equivalence relation is an arbitrary oracle rather than a program. – user382751 Jul 16 '10 at 14:36
  • "Weak" was the wrong word; "satisfied often enough in practice to be worth analysing" would have been better. (A very common case would be dividing a set into 2 classes, for example.) ShreevatsaR's assumption of O(1) time to decide equivalence isn't really an assumption at all, just an abbreviation that omits the uninteresting detail of how long the subroutine takes. Spelling it out in full, his algorithm takes O(kn^2) if the subroutine takes O(k) time. – j_random_hacker Jul 16 '10 at 15:07
2

Creating a hash function and an ordering may be expensive but will usually be possible. One trick is to represent an equivalence class by a pre-arranged member of that class, for instance, the member whose serialised representation is smallest, when considered as a bit string. When somebody hands you a member of an equivalence class, map it to this canonicalised member of that class, and then hash or compare the bit string representation of that member. See e.g. http://en.wikipedia.org/wiki/Canonical#Mathematics

Examples where this is not possible or convenient include when somebody gives you a pointer to an object that implements equals() but nothing else useful, and you do not get to break the type system to look inside the object, and when you get the results of a survey that only asks people to judge equality between objects. Also Kruskal's algorithm uses Union&Find internally to process equivalence relations, so presumbly for this particular application nothing more cost-effective has been found.

mcdowella
  • 19,301
  • 2
  • 19
  • 25
1

One example that seems to fit your request is an IEEE floating point type. In particular, a NaN doesn't compare as equivalent to anything else (nor even to itself) unless you take special steps to detect that it's a NaN, and always call that equivalent.

Likewise for hashing. If memory serves, any floating point number with all bits of the significand set to 0 is treated as having the value 0.0, regardless of what the bits in the exponent are set to. I could be remembering that a bit wrong, but the idea is the same in any case -- the right bit pattern in one part of the number means that it has the value 0.0, regardless of the bits in the rest. Unless your hash function takes this into account, it will produce different hash values for numbers that really compare precisely equal.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Interesting example. A hash function for that datatype could be "if significand is all-0s then return 0, else if exponent is all-1s (NaN) then return 1, else hash the value into the range [2..n]". Or if the desire is for all NaNs to be partitioned into distinct groups, then skip the exponent check. I don't think this example meets the criteria in #1 or #2. – Dan Jul 16 '10 at 04:16
  • NaN ignores the bits in the mantissa, but zero does not. There is an implied one bit before the mantissa unless all bits besides the sign are zero. – drawnonward Jul 16 '10 at 04:19
  • @Dan: ultimately, it doesn't prevent you from writing *some* function that provides some ordering -- but ultimately, if you can represent it in a computer at all, you can probably do the same. The ordering you give may not mean anything, but that's exactly what we have here -- there's a reason a NaN is specified as comparing unequal to everything, including itself, and if we define them as comparing as equal (or equivalent) we're simply forcing "there must be equivalence" into a situation that doesn't really support equivalence. – Jerry Coffin Jul 16 '10 at 11:53
1

As you probably know, comparison-based sorting takes at least O(n log n) time (more formally you would say it is Omega(n log n)). If you know that there are fewer than log2(n) equivalence classes, then partitioning is faster, since you only need to check equivalence with a single member of each equivalence class to determine which part in the partition you should assign a given element to.

I.e. your algorithm could be like this:

For each x in our input set X:
    For each equivalence class Y seen so far:
        Choose any member y of Y.
        If x is equivalent to y:
            Add x to Y.
            Resume the outer loop with the next x in X.

    If we get to here then x is not in any of the equiv. classes seen so far.
    Create a new equivalence class with x as its sole member.

If there are m equivalence classes, the inner loop runs at most m times, taking O(nm) time overall. As ShreetvatsaR observes in a comment, there can be at most n equivalence classes, so this is O(n^2). Note this works even if there is not a total ordering on X.

j_random_hacker
  • 50,331
  • 10
  • 105
  • 169
1

Theoretically, it is alway possible (for questions 1 and 2), because of the Well Ordering Theorem, even when you have an uncountable number of partitions.

Even if you restrict to computable functions, throwawayaccount's answer answers that.

You need to more precisely define your question :-)

In any case,

Practically speaking,

Consider the following:

You data type is the set of unsigned integer arrays. The ordering is lexicographic comparison.

You could consider hash(x) = x, but I suppose that is cheating too :-)

I would say (but haven't thought more about getting a hash function, so might well be wrong) that partitioning by ordering is much more practical than partitioning by hashing, as hashing itself could become impractical. (A hashing function exists, no doubt).

0

EDIT: This answer is wrong

I am not going to delete it just because some of the comments below are enlightening

Not every equivalence relationship implies an order

As your equivalence relationship should not induce an order, let´s take an un-ordered distance function as relation.

If we get the set of functions f(x):R -> R as our datatype, and define an equivalence relation as:

f is equivalent to g if  f(g(x)) = g(f(x)  [commuting Operators][1]

Then you can't sort on that order (no injective function exists with the Real numbers). You just can't find a function which maps your datatype to numbers due to the cardinality of the function's space.

Dr. belisarius
  • 60,527
  • 15
  • 115
  • 190
  • Not true if we assume the Axiom of Choice. – user382751 Jul 16 '10 at 14:33
  • Yes, Well Ordering Theorem. I just added an answer referencing that! –  Jul 16 '10 at 14:58
  • @throwawayacct Here we are talking about a SPECIAL ordering induced by an equivalence relationship. Partial ordered sets do exists, depending on your choice of the ordering. http://en.wikipedia.org/wiki/Partially_ordered. If we forget the equivalence relation, the problem is different, and you can define an order, for Zermelo's sake! – Dr. belisarius Jul 16 '10 at 15:17
  • Even with a given equivalence relation, a well ordering exists between the equivalence classes, and that is all you need! –  Jul 16 '10 at 16:32
  • @throwawayacct: there's no need to *assume* Axiom of Choice when working with finite sets.. – BlueRaja - Danny Pflughoeft Jul 16 '10 at 21:42
0

I believe that...

1- Can you suggest a datatype and equivalence relation such that it is not possible to create an ordering?

...it's possible only for infinite (possibly only for non-countable) sets.

2- How about a datatype and equivalence relation where it is possible to create such an ordering, but it is not possible to define a hash function on the datatype that will map equivalent items to the same hash value.

...same as above.

Mau
  • 14,234
  • 2
  • 31
  • 52
0

Suppose that F(X) is a function which maps an element of some data type T to another of the same type, such that for any Y of type T, there is exactly one X of type T such that F(X)=Y. Suppose further that the function is chosen so that there is generally no practical way of finding the X in the above equation for a given Y.

Define F0=X, F{1}(X)=F(X), F{2}(X)=F(F(X)), etc. so F{n}(X) = F(F{n-1}(X)).

Now define a data type Q containing a positive integer K and an object X of type T. Define an equivalence relation thus:

Q(a,X) vs Q(b,Y):

If a > b, the items are equal iff F{a-b}(Y)==X

If a < b, the items are equal iff F{b-a}(X)==Y

If a=b, the items are equal iff X==Y

For any given object Q(a,X) there exists exactly one Z for F{a}(Z)==X. Two objects are equivalent iif they would have the same Z. One could define an ordering or hash function based upon Z. On the other hand, if F is chosen such that its inverse cannot be practically computed, the only practical way to compare elements may be to use the equivalence function above. I know of no way to define an ordering or hash function without either knowing the largest possible "a" value an item could have, or having a means to invert function F.

supercat
  • 77,689
  • 9
  • 166
  • 211