0

For resolving hashing collision in the Hash Table data structure, we have one very popular strategy called Separate Chaining.

I'm aware, that in the Separate Chaining strategy, keys, which end up being collided into backing array's same index (due to the fact, that they're hashed into the same particular values), are Linked Lists.

I wonder whether the type of backing array is LinkedList<E>[] from the moment of creation of Hash Table (during separate chaining strategy implementation), or it's int[] and it gets converted to the LinkedList<E>[] array after first collision?

Because, having Linked Lists as each element of the backing array seems not the most optimal solution.. it means, that those Linked Lists, should be a list of the elements, which in turn, are Entries/Buckets of a pair of key-value.. and this all really consumes a lot of memory and resource, I reckon.

I did quite a research in different books and academic articles; yet, I still can't really get a clear answer on this.

Giorgi Tsiklauri
  • 9,715
  • 8
  • 45
  • 66

1 Answers1

1

Yes, separate chaining will cost more memory than probing or re-hashing. But the benefit is that you get more items in the hash table before performance begins to suffer. At some point you still have to re-index: typically when you realize that some bucket is over-represented or when the total number of occupied buckets exceeds some threshold.

Note that the backing array itself isn't a linked list. The backing array for a hash table that uses probing or re-hashing will probably be a dynamically-sized array of entries. Your entry would be something like:

class Entry {
    String: key;
    SomeObject: value;
}

If you're using separate chaining, the Entry object gets an additional field: a reference to the next item that hashed to the same bucket:

class Entry {
    String: key;
    SomeObject: value;
    Entry: next;
}

The memory difference for the first item really isn't enough to worry about.

It's possible to write the code so that if a bucket has but a single item, it will contain just the key and value, and the bucket is converted to a linked list only on first collision. There is perhaps a small memory win there, and an even smaller performance gain. But the code is more complex and the gains aren't huge unless you know that the majority of your buckets won't have any collisions. Not worth the trouble of implementing, testing, and maintaining two different code paths.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • Well, that's not entirely true when we talk about `Entry` type, as according to what I've researched (among many resources, books, video tutorials), the type of backing array really depends on the Collision Resolution strategy, and hence - how the HashTable is implemented. Some have array of the type of buckets/lists, some have array of the type of special Entries.. and etc; however, the concept and logic is explained well and I'm accepting this answer. – Giorgi Tsiklauri Jun 10 '20 at 13:09