0

I am working on an existing system that using NCache. it is a distributed system with large caching requirements, so there is no question that caching is the correct answer, but...

For some reason, in the existing code, all cache keys are hashed before storing in the cache.

My argument is that we should NOT hash the key, as the caching library may have some super optimized way of storing it's dictionary and hashing everything means we may actually be slowing down lookups if we do this.

The guy who originally wrote the code has left, and the knowledge of why the keys are cached has been lost.

Can anyone suggest if hashing is the correct thing to do, or should it be removed.

Neil
  • 11,059
  • 3
  • 31
  • 56
  • 1
    I think you've seen this: http://www.alachisoft.com/resources/docs/ncache/ncache-programmers-guide.pdf – Mahdi Jan 10 '17 at 12:00
  • Thanks/ There seems to be no mention of hashing keys in there. All the examples use a plain readably key string. – Neil Jan 10 '17 at 13:00
  • Honestly, I am not sure what to interpret from the following words: "Data is distributed/partitioned among all server nodes on the basis of the hash code of the cache key." – Mahdi Jan 10 '17 at 13:04
  • The documentation emphasizes on this several times. – Mahdi Jan 10 '17 at 13:04
  • Isn't that an internal representation of things though? Also, my hash (MD5) may not be the same as the hash used internally, and even then, the library may hash my hash, which then ends up as entirely different. – Neil Jan 10 '17 at 13:46

2 Answers2

1

Whether you should or shouldn't hash keys depends on your system requirements.

NCache identifies object by it's key, and considers objects with equal keys to be equal. Below is a definition of a hash function from Wikipedia:

A hash function is any function that can be used to map data of arbitrary size to data of fixed size.

If you stop hash keys, then cache may behave differently. For example, some objects that NCache considered equal, now NCache may consider not equal. And instead of one cache entry you will get two.

NCache doesn't require you to hash keys. NCache key is just a string that is unique for each object. Relevant excerpt from NCache 4.6 Programmer’s Guide:

NCache uses a “key” and “value” structure for objects. Every object must have a unique string key associated with it. Every key has an atomic occurrence in the cache whether it is local or clustered. Cached keys are case sensitive in nature, and if you try to add another key with same value, an OperationFailedException is thrown by the cache.

Leonid Vasilev
  • 11,910
  • 4
  • 36
  • 50
  • In my application, the keys are always unique, but after hashing, may not be. That is sort of the opposite of your answer. – Neil Jan 10 '17 at 13:47
  • That is just an another example. My point is that if you change keys, this may affect how your application behave. – Leonid Vasilev Jan 10 '17 at 13:52
  • We clear all cached objects during redeployment, so this is not really a problem. – Neil Jan 10 '17 at 14:00
  • If you are concerned about performance, you need to collect and study performance metrics before and after change. You can use projects similar to [Metrics.NET](https://github.com/Recognos/Metrics.NET) to instrument your application code with metrics. – Leonid Vasilev Jan 10 '17 at 14:21
  • No, I'm not concerned about performance, I'm concerned about calling the library correctly. I believe hashing the key is incorrect, but I would like someone to confirm or deny this. – Neil Jan 10 '17 at 16:10
  • NCache not require you to hash keys. NCache key is just a string that is unique for each object. So you can use either identity string or it's hash. I probably wouldn't use hash, bacause I don't see the reason to do so. – Leonid Vasilev Jan 10 '17 at 16:49
1

Okay so your question is

  1. Should we hash the keys before storing?
  2. If you yourself do hashing, will it slow down anything

Well, the cache API works on strings as keys. In the background NCache automatically generates hashes against these keys which help it to identify where the object should be stored. And by where I mean in which node.

When you say that your application Hashes keys before handing it over to NCahe, then it is simple an unnecessary step. NCache API was meant to take this headache from you.

BUT if those hashes were generated because of some internal Logic within your application then that's another case. Please check carefully.

Needless to say, if you're doing something again and again then it will definitely have a performance degradation. The Hash strings that you provide will be used again to generate another hash value (int).

Basit Anwer
  • 6,742
  • 7
  • 45
  • 88