How do I implement an erase function for a hash table?

Question

I have a hash table using linear probing. I've been given the task to write an erase(int key) function with the following guidelines.

 void erase(int key);

 Preconditions:  key >= 0
 Postconditions: If a record with the specified key exists in the table, then
 that record has been removed; otherwise the table is unchanged.

I was also given some hints to accomplish the task

It is important to realize that the insert function will allow you to add a new entry to the table, or to update an existing entry in the table.
For the linear probing version, notice that the code to insert an item has two searches. The insert() function calls function findIndex() to search the table to see if the item is already in the table. If the item is not in the table, a second search is done to find the position in the table to insert the item. Adding the ability to delete an entry will require that the insertion process be modified. When searching for an existing item, be sure that the search does not stop when it comes to a location that was occupied but is now empty because the item was deleted. When searching for a position to insert a new item, use the first empty position - it does not matter if the position has ever been occupied or not.

So I've started writing erase(key) and I seem to have run into the problem that the hints are referring to, but I'm not positive what it means. I'll provide code in a second, but what I've done to test my code is set up the hash table so that it will have a collision and then I erase that key and rehash the table but it doesn't go into the correct location.

For instance, I add a few elements into my hash table:

The hash table is:
Index  Key    Data
    0   31     3100
    1    1     100
    2    2     200
    3   -1
    4   -1
    5   -1
    6   -1
    7   -1
    8   -1
    9   -1
   10   -1
   11   -1
   12   -1
   13   -1
   14   -1
   15   -1
   16   -1
   17   -1
   18   -1
   19   -1
   20   -1
   21   -1
   22   -1
   23   -1
   24   -1
   25   -1
   26   -1
   27   -1
   28   -1
   29   -1
   30   -1

So all of my values are empty except the first 3 indices. Obviously key 31 should be going into index 1. But since key 1 is already there, it collides and settles for index 0. I then erase key 1 and rehash the table but key 31 stays at index 0.

Here are the functions that may be worth looking at:

void Table::insert( const RecordType& entry )
{
   bool alreadyThere;
   int index;

   assert( entry.key >= 0 );

   findIndex( entry.key, alreadyThere, index );
   if( alreadyThere )
      table[index] = entry;   
   else
   {
      assert( size( ) < CAPACITY );
      index = hash( entry.key );
      while ( table[index].key != -1 )
         index = ( index + 1 ) % CAPACITY;
      table[index] = entry;
      used++;
   }
}

Since insert uses findIndex, I'll include that as well

void Table::findIndex( int key, bool& found, int& i ) const
{
   int count = 0;

   assert( key >=0 );

   i = hash( key );
   while ( count < CAPACITY && table[i].key != -1 && table[i].key != key )
   {
      count++;
      i = (i + 1) % CAPACITY;
   }   
   found = table[i].key == key;
}

And here is my current start on erase

void Table::erase(int key) 
{
    assert(key >= 0);

    bool found, rehashFound;
    int index, rehashIndex;

    //check if key is in table
    findIndex(key, found, index);

    //if key is found, remove it
    if(found)
    {
        //remove key at position
        table[index].key = -1;
        table[index].data = NULL;
        cout << "Found key and removed it" << endl;
        //reduce the number of used keys
        used--;
        //rehash the table

        for(int i = 0; i < CAPACITY; i++)
        {
            if(table[i].key != -1)
            {
                cout << "Rehashing key : " << table[i].key << endl;
                findIndex(table[i].key, rehashFound, rehashIndex);
                cout << "Rehashed to index : " << rehashIndex << endl;
                table[rehashIndex].key = table[i].key;
                table[rehashIndex].data = table[i].data;
            }
        }
    }
}

Can someone explain what I need to do to make it rehash properly? I understand the concept of a hash table, but I seem to be doing something wrong here.

EDIT

As per user's suggestion:

void Table::erase(int key)
{
    assert(key >= 0);
    bool found;
    int index;

    findIndex(key, found, index);

    if(found) 
    {
        table[index].key = -2;
        table[index].data = NULL;
        used--;

    }

}


//modify insert(const RecordType & entry)

while(table[index].key != -1 || table[index].key != -2)


//modify findIndex

while(count < CAPACITY && table[i].key != -1
      && table[i].key != -2 && table[i].key != key)

"Obviously key 31 should be going into index 1" - looks to me like index 0 is the proper spot for it. I think your insertion logic has a bug. — user2357112, Dec 14 '13 at 00:30
@user2357112 I was under the impression that if I hash using (31 % 30 = 1), Is this incorrect? — StartingGroovy, Dec 14 '13 at 00:33
`index = hash( entry.key );` - is `hash` going to handle modding by the capacity? If not, this could do weird things. — user2357112, Dec 14 '13 at 00:36
Ah, I feel dumb haha. I wasn't paying attention to the start index on the print out. Now it seems that I have a logic error in my erase function. *Edit* Yes hash uses capacity. — StartingGroovy, Dec 14 '13 at 00:37

user2357112 · Accepted Answer · 2013-12-14T01:05:14.610

2

When deleting an item from the table, don't move anything around. Just stick a "deleted" marker there. On an insert, treat deletion markers as empty and available for new items. On a lookup, treat them as occupied and keep probing if you hit one. When resizing the table, ignore the markers.

Note that this can cause problems if the table is never resized. If the table is never resized, after a while, your table will have no entries marked as never used, and lookup performance will go to hell. Since the hints mention keeping track of whether an empty position was ever used and treating once-used cells differently from never-used, I believe this is the intended solution. Presumably, resizing the table will be a later assignment.

edited Dec 14 '13 at 01:05

answered Dec 14 '13 at 00:25

user2357112

260,549
28
431
505

Would deleted be the same as the -1 that is already there? This is what I'm currently doing and changing the data to null. If this isn't the case, wouldn't that case issues on the rehash? – StartingGroovy Dec 14 '13 at 00:43
@StartingGroovy: No, deletion markers have to be distinguishable from regular empty cells. Don't rehash on a delete. – user2357112 Dec 14 '13 at 00:48
Anyone want to explain the downvote? This *is* a real technique used in actual hash tables. For example, Python uses this, although its probing algorithm is much more sophisticated. – user2357112 Dec 14 '13 at 00:50
So, let the key be -2 and set the data to null. Then when using insert treat -2 as -1. However when calling findIndex it should treat -2 as taken? (also I didn't downvote) – StartingGroovy Dec 14 '13 at 00:54
Pretty much. Just make sure you don't get -1 or -2 as an actual key. – user2357112 Dec 14 '13 at 00:56
@StartingGroovy: Almost. You have the wrong boolean operator in the modification to `insert`. – user2357112 Dec 14 '13 at 01:08
Really, it should be an AND? – StartingGroovy Dec 14 '13 at 01:15
@StartingGroovy: Yes, it should. Don't think of it as "not this thing or that thing". Think of it as "not this thing and also not that thing". – user2357112 Dec 14 '13 at 01:24
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/43160/discussion-between-startinggroovy-and-user2357112) – StartingGroovy Dec 14 '13 at 01:48

Tony Delroy · Answer 2 · 2013-12-14T03:02:04.430

1

It's not necessary to rehash the entire table every time a delete is done. If you want to minimise degradation in performance, then you can compact the table by considering whether any of the elements after (with wrapping from end to front allowed) the deleted element but before the next -1 hash to a bucket at or before the deleted element - if so, then they can be moved to or at least closer to their hash bucket, then you can repeat the compaction process for the just-moved element.

Doing this kind of compaction will remove the biggest flaw in your current code, which is that after a little use every bucket will be marked as either in use or having been used, and performance for e.g. find of a non-existent value will degrade to O(CAPACITY).

Off the top of my head with no compiler/testing...

int Table::next(int index) const
{
    return (index + 1) % CAPACITY;
}

int Table::distance(int from, int to) const
{
    return from < to ? to - from : to + CAPACITY - from;
}

void Table::erase(int key)
{
    assert(key >= 0);
    bool found;
    int index;

    findIndex(key, found, index);

    if (found) 
    {
        // compaction...
        int limit = CAPACITY - 1;
        for (int compact_from = next(index);
             limit-- && table[compact_from].key >= 0;
             compact_from = next(compact_from))
        {
            int ideal = hash(table[compact_from].key);
            if (distance(ideal, index) <
                distance(ideal, compact_from))
            {
                table[index] = table[compact_from];
                index = compact_from;
            }
        }

        // deletion
        table[index].key = -1;
        delete table[index].data; // or your = NULL if not a leak? ;-.
        --used;
    }
}

edited Dec 14 '13 at 03:02

answered Dec 14 '13 at 00:40

Tony Delroy

102,968
15
177
252

Okay, so I don't need to hash the entire table after I delete an element. I only need to rehash the elements between deleted+1 through next-1. Ie. Elements 1-4 have values, I delete element 2. Element 3 & 4 should be rehashed? – StartingGroovy Dec 14 '13 at 00:59
Yes, you should rehash 3 to see if moving it to 2 would make it closer to its direct-hash-to-bucket, if so then you'd check whether moving 4 to 3 helps too, otherwise you'd check moving 4 to 2. Finish at 5/-1. – Tony Delroy Dec 14 '13 at 01:02
"after a little use every bucket will be marked as either in use or having been used" - true, but the better solution is to resize the table when load factor hits a threshold. – user2357112 Dec 14 '13 at 01:05
@user2357112 no actually... what happens if in an application a hash table's typical usage is to initially populate N entries, then as each further entry is inserted another is deleted? The load factor doesn't change, there's no resizing, and performance degrades. – Tony Delroy Dec 14 '13 at 01:11
@user2357112: that wouldn't indicate need for a "resize" but could be a usable trigger for rehashing/compaction, but why put it off (and take large unpredictable performance hits) and have to grope blindly for compaction opportunities when you can do targeted compaction as you go? Further - and particularly with a weak hash and linear probing - you could easily have bad performance in part of the table while the overall load's under the threshold. – Tony Delroy Dec 14 '13 at 01:20
@TonyD: Sure, but that's only an argument against linear probing. You could easily have horrible performance compacting on every delete with a contiguous range of occupied cells. – user2357112 Dec 14 '13 at 01:22
"Large unpredictable performance hits" aren't a problem. The resize cost is amortized constant, and the lookup performance is no worse than with constant compaction. – user2357112 Dec 14 '13 at 01:26
@user2357112: it wasn't only an argument against linear probing - I just said "particularly with". Compacting per delete might not always be fast in absolute terms, but each time you do it makes future operations faster - again, better not to put it off. Re unpredictable performance hits not being a problem - who's to say that? Depends on the applicaiton. It's a very common criticism of garbage collection. As I've said, with delayed compaction lookup *is* worse when not found, which means inserts of new elements are worse too. – Tony Delroy Dec 14 '13 at 01:28
@TonyD I was attempting to write an erase function based on your suggestion, but I'm having trouble with it. While I'm on a time crunch, I'll probably go with user's suggestion, but I'm curious as to how your erase function would look. If you're up to writing it, I would appreciate it for learning purposes. If not, not big deal, I will try and come back to it after I finish the task. – StartingGroovy Dec 14 '13 at 01:35
@TonyD Thank you for the great example. I tried to do `delete table[index].data` prior to simply setting it to `NULL` but it was giving me a syntax error. Also, I upvoted a few of your answers elsewhere :) – StartingGroovy Dec 14 '13 at 02:56
@StartingGroovy: you're welcome - it was fun to write. Maybe a typo with your delete before? - it should work. Thanks for the upvotes! :-) – Tony Delroy Dec 14 '13 at 03:03
Any chance you could invite me to a chat? – StartingGroovy Dec 14 '13 at 03:17
@StartingGroovy - took me a while to work out how to do it - catch you in http://chat.stackoverflow.com/rooms/43053/temporary ...? – Tony Delroy Dec 14 '13 at 03:31

How do I implement an erase function for a hash table?

2 Answers2