Create key via SQL and C# for partition key

Question

I have a set of data which has a hierarchy of 3 levels. Each level has a name.

I am looking at combining all of these names into a single string then creating a numeric hash that can be used as a hash key for a service fabric stateful service.

I have seen lots online about finding data with keys but I am not sure how to actually create them in an efficient way.

Ideally I would like a hash that is quick and easy to generate in SQL Server 2017 and C#.

Can anyone point me in the right direction, please?

Paul

Usually you combine the string with a characters that is not in the strings. So a lot of people use the "^". So if you have strings a, b, c use string.Join("^", new string[] { a, b, c}); — jdweng, Jul 07 '18 at 19:01
To clarify, you mean you have a large set of C# POCO classes that you want to both serialize and hash? To be "correct" you'll want to hash the output of your serialization function. How are you serializing your objects? — Dai, Jul 07 '18 at 23:59
I understand the best hash-algorithm for speed and uniqueness (you do **not** want to use a cryptographic hash function because those are deliberately not optimized for speed) is something like MurmurHash: https://en.wikipedia.org/wiki/MurmurHash — Dai, Jul 08 '18 at 00:00
@Dai: The hash need to be unique 100% of the time. Many of the posting I've seen do not give a unique hash under every condition. If a cryptographic algorithm is used it cannot be a password algorithm that is not reversible that will give a one-to-many mapping. It must be a one-to-one that is reversible. — jdweng, Jul 08 '18 at 02:14
Use string.Join("^", new string[] { a, b, c}).GetHashCode(); I'm not sure what is efficient about this method. — jdweng, Jul 08 '18 at 02:27
@jdweng It is impossible to have 100% unique hashes - and a “reversible hash” is an oxymoron. — Dai, Jul 08 '18 at 03:41
Then how do you guarantee the code will ALWAYS work without unique hashes? Without unique hashes when a comparison is performed between two objects you can incorrectly get wrong results. — jdweng, Jul 08 '18 at 10:05
This is not about unique hashes as such I need consistent hashes. If I give the text "LEVEL 1 LEVEL 2 LEVEL 3" I want the same hash code to be generated each time. I am not serializing anything, I just want to give a method a string and that method to give me back a numeric hash value I can use as a partition key in Service Facric — Paul, Jul 08 '18 at 10:08
@Paul as far as I understand you want to use numeric value as key based on the hierarchy path of the item in database. If I am right you don't need hash function you need to create associations between hierarchy path and numeric value because hash function cannot guarantee uniqueness. I am not sure where this association can be implemented because I can't see a design, so if you can provide a bit more information maybe I would be able to help. — Oleg Karasik, Jul 09 '18 at 15:10
I don’t really need uniqueness I need something that will generate keys that’s are spread nicely across my nodes — Paul, Jul 09 '18 at 19:02

LoekD · Accepted Answer · 2018-07-30T06:21:02.740

The SF team advice is to use the FNV-1 hashing algorithm for this.

Select a hash algorithm An important part of hashing is selecting your hash algorithm. A consideration is whether the goal is to group similar keys near each other (locality sensitive hashing)--or if activity should be distributed broadly across all partitions (distribution hashing), which is more common.

The characteristics of a good distribution hashing algorithm are that it is easy to compute, it has few collisions, and it distributes the keys evenly. A good example of an efficient hash algorithm is the FNV-1 hash algorithm.

A good resource for general hash code algorithm choices is the Wikipedia page on hash functions.

A C# implementation in this example here:

public long HashString(string input)
{
    input = input.ToUpperInvariant();
    var value = Encoding.UTF8.GetBytes(input);
    ulong hash = 14695981039346656037;
    unchecked
    {
       for (int i = 0; i < value.Length; ++i)
       {
          hash ^= value[i];
          hash *= 1099511628211;
       }        
       return (long)hash;
    }
}

Remove the ToUpperInvariant to make it case sensitive.

Create key via SQL and C# for partition key

1 Answers1

Linked