0

I have a set of data which has a hierarchy of 3 levels. Each level has a name.

I am looking at combining all of these names into a single string then creating a numeric hash that can be used as a hash key for a service fabric stateful service.

I have seen lots online about finding data with keys but I am not sure how to actually create them in an efficient way.

Ideally I would like a hash that is quick and easy to generate in SQL Server 2017 and C#.

Can anyone point me in the right direction, please?

Paul

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Paul
  • 2,773
  • 7
  • 41
  • 96
  • Usually you combine the string with a characters that is not in the strings. So a lot of people use the "^". So if you have strings a, b, c use string.Join("^", new string[] { a, b, c}); – jdweng Jul 07 '18 at 19:01
  • Ok not sure how this would get a numeric hash? – Paul Jul 07 '18 at 19:02
  • @jdweng That's a very inefficient way of doing that... – Dai Jul 07 '18 at 23:58
  • To clarify, you mean you have a large set of C# POCO classes that you want to both serialize and hash? To be "correct" you'll want to hash the output of your serialization function. How are you serializing your objects? – Dai Jul 07 '18 at 23:59
  • I understand the best hash-algorithm for speed and uniqueness (you do **not** want to use a cryptographic hash function because those are deliberately not optimized for speed) is something like MurmurHash: https://en.wikipedia.org/wiki/MurmurHash – Dai Jul 08 '18 at 00:00
  • @Dai: The hash need to be unique 100% of the time. Many of the posting I've seen do not give a unique hash under every condition. If a cryptographic algorithm is used it cannot be a password algorithm that is not reversible that will give a one-to-many mapping. It must be a one-to-one that is reversible. – jdweng Jul 08 '18 at 02:14
  • Use string.Join("^", new string[] { a, b, c}).GetHashCode(); I'm not sure what is efficient about this method. – jdweng Jul 08 '18 at 02:27
  • @jdweng It is impossible to have 100% unique hashes - and a “reversible hash” is an oxymoron. – Dai Jul 08 '18 at 03:41
  • Then how do you guarantee the code will ALWAYS work without unique hashes? Without unique hashes when a comparison is performed between two objects you can incorrectly get wrong results. – jdweng Jul 08 '18 at 10:05
  • This is not about unique hashes as such I need consistent hashes. If I give the text "LEVEL 1 LEVEL 2 LEVEL 3" I want the same hash code to be generated each time. I am not serializing anything, I just want to give a method a string and that method to give me back a numeric hash value I can use as a partition key in Service Facric – Paul Jul 08 '18 at 10:08
  • @Paul as far as I understand you want to use numeric value as key based on the hierarchy path of the item in database. If I am right you don't need hash function you need to create associations between hierarchy path and numeric value because hash function cannot guarantee uniqueness. I am not sure where this association can be implemented because I can't see a design, so if you can provide a bit more information maybe I would be able to help. – Oleg Karasik Jul 09 '18 at 15:10
  • I don’t really need uniqueness I need something that will generate keys that’s are spread nicely across my nodes – Paul Jul 09 '18 at 19:02

1 Answers1

2

The SF team advice is to use the FNV-1 hashing algorithm for this.

Select a hash algorithm An important part of hashing is selecting your hash algorithm. A consideration is whether the goal is to group similar keys near each other (locality sensitive hashing)--or if activity should be distributed broadly across all partitions (distribution hashing), which is more common.

The characteristics of a good distribution hashing algorithm are that it is easy to compute, it has few collisions, and it distributes the keys evenly. A good example of an efficient hash algorithm is the FNV-1 hash algorithm.

A good resource for general hash code algorithm choices is the Wikipedia page on hash functions.

A C# implementation in this example here:

public long HashString(string input)
{
    input = input.ToUpperInvariant();
    var value = Encoding.UTF8.GetBytes(input);
    ulong hash = 14695981039346656037;
    unchecked
    {
       for (int i = 0; i < value.Length; ++i)
       {
          hash ^= value[i];
          hash *= 1099511628211;
       }        
       return (long)hash;
    }
}

Remove the ToUpperInvariant to make it case sensitive.

LoekD
  • 11,402
  • 17
  • 27