How to implement ora_hash (seedable hash that divides any sql datatype into n buckets)

Question

I've looked at the wikipedia article on Hash tables but it doesn't seem to say anything about how to implement a hash table that can distribute entries of arbitrary SQL datatype evenly into n buckets.

Can anyone point me in the direction of documentation or existing source code on this topic?

Are you asking how to implement a hash function in general? How the `ora_hash` hash function specifically was implemented? How to build a hash table (a data structure that uses a hash function)? Or just how to separate data into buckets in a SQL query? — Justin Cave, Jun 23 '14 at 22:52
How the ora_hash function specifically was implemented, preferably some kind of pseudocode/code but an abstract description would also be fine — wonton, Jun 23 '14 at 22:56
I don't believe Oracle has ever documented what algorithm they use. It's the same as the algorithm used by `dbms_utility.get_hash_value` but I don't believe they documented the internal algorithm there either. — Justin Cave, Jun 23 '14 at 23:05
Do you know of any similar implementations with documentation, or a starting point on figuring out how to implement such a hash? — wonton, Jun 23 '14 at 23:39
What do you mean by "similar implementation"? There are a rather large number of rather well-documented hashing algorithms out there. Oracle provides MD4, MD5, and SHA1 implementations in the `dbms_crypto.hash` function http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_crypto.htm#i1002022 All of those algorithms are going to be well documented in a variety of locations. — Justin Cave, Jun 23 '14 at 23:42
yep, but md and sha don't have the ability to always evenly distribute to n buckets. — wonton, Jun 23 '14 at 23:44
Neither does `ora_hash`. The goal is even distribution. But that's not a guarantee. If you pass in enough distinct values, though, any vaguely reasonable hashing algorithm should allow you to distribute data basically evenly across n buckets. — Justin Cave, Jun 23 '14 at 23:46
Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/56153/discussion-between-wonton-and-justin-cave). — wonton, Jun 23 '14 at 23:48
@wonton Wondering if you might be able to help... I read your statement in the chat that the `ora_hash` algorithm is "the same as the algorithm used by dbms_utility.get_hash_value". I tried to verify this with a simple example [here](http://rextester.com/GQE56504) but the returned hashes are different - am I doing something wrong here - and if so, do you know how to correct it? — Steve Chambers, Dec 13 '17 at 13:02

score 1 · Accepted Answer · answered Jun 24 '14 at 03:06

I believe you are talking about a perfect hash function. Oracle's ORA_HASH function is not a perfect hash function.

http://en.wikipedia.org/wiki/Perfect_hash_function

As close as you will get to what you seem to want is an associative array. Oracle has those. Start playing with this example:

set serverout on size 10000
DECLARE
cursor foo 
is 
  select distinct fld1,fld2,fld9  from sometable;

type t is table of foo.%ROWTYPE
  index by varchar2;   -- change the index to an int if you want

myarray t; -- myarray is a table of records -- whatever foo returns

BEGIN
  for x in foo
  loop
      -- index using the first column of the fetched row  "fld1":
      myarray(x.fld1)=x;  -- assign the rowtype to the table of records.      
  end loop;

END;
/

Note: an associative array is built on a hashtable, the above example uses fld1 as the hash key. So the above will only work if as you describe, perfect hashing, if and only if fld1 is a unique field. That is what the distinct in in there to do. It is never always required.

How to implement ora_hash (seedable hash that divides any sql datatype into n buckets)

1 Answers1

Linked