1

What will be the best idea in the context of C#,

  1. In C# i am using a dictionary. I want it to use less memory space. what will be better?

    A dictionary where the key type is Uint64 or where the key type is a string? in both cases the value is a custom class which is same for each dictionary.

    I have declared the dictionary as following,

    private static readonly Dictionary<string, List<Node>> HashTable =
        new Dictionary<string, List<Node>>();
    

    class node is defined as below,

    public class Node
    {
        public UInt64 CurrentIndex { get; set; }
        public string NextHashedString { get; set; }
        public int NextHashPos { get; set; }
    }
    

    The key of the string is actually is a hashvalue from a string computed as follows, The length of string may be 1 to 20 characters.

    static UInt64 CalculateHash(string read, bool lowTolerance)
    {
        UInt64 hashedValue = 0;
        int i = 0;
        while (i < read.Length)
        {
            hashedValue += read.ElementAt(i) * (UInt64)Math.Pow(31, i);
            if (lowTolerance) i += 2;
            else i++;
        }
        return hashedValue;
    }
    

    Now, I want to store this hash value as a key to the dictionary. What will be the best idea. I use as Uint64 or I convert it to string and use string as a dictionary key. My primary goal is the dictionary uses minimum space and search time for a key is faster.

  2. I have a file with 3571079 characters. Can I read the whole file into a string or I need advanced data structures?

P basak
  • 4,874
  • 11
  • 40
  • 63
  • 1
    You haven't provided nearly enough data about the first situation. It's unusual to be able to choose between a UInt64 key or a string key... – Jon Skeet Mar 03 '12 at 10:40
  • This question is unclear to me... can you provide some examples of what you're trying to achieve ? – digEmAll Mar 03 '12 at 10:40
  • @JonSkeet I modified the question with snippets. – P basak Mar 03 '12 at 10:48
  • @digEmAll Please see the modified question. – P basak Mar 03 '12 at 10:48
  • 1
    well, you could update the question title as well, which is not very helpful about the situation. – Can Poyrazoğlu Mar 03 '12 at 10:54
  • Hi, the length is 20 not 3571079, i told it just before the code snippet. I talked about the size of 3571079 in my second question. – P basak Mar 03 '12 at 10:55
  • @canpoyrazoğlu sorry i forgot to complete the title saved before i complete it. – P basak Mar 03 '12 at 10:57
  • How large (how many entries) is the dictionary going to be? – Andre Loker Mar 03 '12 at 11:01
  • mm it depends, for example let have a long string with 5000000 (half million) characters. I take substring of length 20 from position 1 to 5000000. that means 5000000*20 substrings. I take hash of each string and place it in the dictinary. Now if there is no repeated substring in the worst case there will be 5000000*20 entries. – P basak Mar 03 '12 at 11:04

1 Answers1

3

Using a UInt64 instead of a string (or any other reference type) as a key for a dictionary will practically consume less memory. Using a reference type like string requires the dictionary to store the reference to the key in it's internal data structure which will cause the referenced object (the string) to be kept in memory as well, including the per-object overhead etc. When the key is a UInt64, the (current implementation of the) dictionary stores the value of the key instead of a reference to the key (as part of the normal way how generics work) without any separate key objects.

There's only one situation I can think of where a UInt64 key could cause higher memory usage than a string: if the process is 32bit (x86) references are 32 bit. If the dictionary is large, but almost empty, there will be many empty Dictionary<K,V>.Entry instances. For UInt64 keys the key part of those instances will be 64bit (even if no explicit value is assigned) while for string keys it's only 32 bit. So the total amount of allocated memory will be more for the dictionary with UInt64 keys. But this is a very theoretical situation.

So, if you can use UInt64 keys instead of strings without sacrificing other qualities of your software design, there's nothing wrong with using them. But don't start to optimize before it's really necessary. To say it with the words of Donald Knuth: "premature optimization is the root of all evil"

Update: as you've updated your post to show how your UInt64 values are calculated:

  1. If you would simply derive the string key by calling ToString on the UInt64 value you should go for the UInt64 version in the first place. It'll be more efficient by all means.

  2. Using a hash as a key can be somewhat tricky. You need to make sure that the hashes don't collide. Your hash function doesn't look particularly good on the first sight, but this of course depends on your use case. But this is outside of the scope of this question I suppose.

Andre Loker
  • 8,368
  • 1
  • 23
  • 36
  • Hi, what do you mean by "If the dictionary is large, but almost empty, there will be many empty Dictionary.Entry instances" the dictionary will not be empty. Because each time I generate a valid key value pair I add it to the dictionary. – P basak Mar 03 '12 at 11:00
  • @Pbasak It was more of a theoretical consideration which won't affect you in practice. For all practical applications, UInt64 keys will consume less memory than string keys. – Andre Loker Mar 03 '12 at 11:02
  • actually the value is a list of nodes. You can see from the code. If same hash comes out i will add those nodes in the list. I know my hash function is not that much good. I asked a separate question regarding this. – P basak Mar 03 '12 at 11:16