0

I've been writing from scratch a C implementation for a dynamically sized hash table. I made a crucial error in that my hashing function was based on the capacity of the hash table. Since the capacity changes over times this doesn't work. What are the recommendations for developing a hashing function for a dynamically allocated hash table?

Additionally, I am using quadratic probing so my resizing is based on that. For example, if my hash table capacity was 8, and a new key originally hashed to index 0, my new indexes calculated would be 1 (0 + 1^2), 5 (1 + 2^2), 14 (5 + 3^2) etc using quadratic probingand I would stop at 14 since that's larger than 8. So, I'd create a new hash table of capacity 15. I would like to keep this implementation, but if there's a better way I'm open to changing it. Regardless, I'm still looking for how to develop a hash function for a dynamic array, not static.

EDITS: What I mean is because my hashing function is based on the capacity of the hash table, when I go to retrieve an element AFTER the table size has changed, it doesn't work. For example, in my main program I go to remove the element with a key of "A", and print out the table again but A is still there. This is because I used my hashing function to find where "A" existed in order to remove it, but the hashing function is different when I go to remove "A" because when I inserted "A", the capacity was different than when I tried to remove it. So, the hashing function didn't lead me to the right place.

I've read something about when I resize the hash table, I just have to rehash all of the elements currently in the hash table with the size of the new hash table. I was just wondering if there's another way to do it other than that.

status.h

#ifndef STATUS_H
#define STATUS_H

typedef enum status { FAILURE, SUCCESS } Status;
typedef enum boolean { FALSE, TRUE } Boolean;

#endif

HashTableElement.h

#ifndef KEY_AND_DATA_H
#define KEY_AND_DATA_H

#include "status.h"

typedef void* HASH_TABLE_ELEMENT;

/*Precondition: none
  Postcondition: returns a handle to a new hash table element. Else returns NULL */
HASH_TABLE_ELEMENT hash_table_element_create(char* key, int data);

/*Precondition: hHash_table_element is a handle to a valid hash table element, data is the
  new data value.
  Postcondition: the data inside the hash table has been updated. */
void hash_table_element_update(HASH_TABLE_ELEMENT hHash_table_element, int data);

/*Precondition: hHash_table_element is a handle to a valid hash table element.
  Postcondition: returns the data value. */
int hash_table_element_get_data(HASH_TABLE_ELEMENT hHash_table_element);

/*Precondition: hHash_table_element is a handle to a valid hash table element.
  Postcondition: returns the key */
const char* hash_table_element_get_key(HASH_TABLE_ELEMENT hHash_table_element);

/*Precondition: hHash_table_element1 and 2 are handles to valid hash table elements. 
  Postcondition: returns true or false if the keys match or not*/
Boolean hash_table_element_keys_match(HASH_TABLE_ELEMENT hHash_table_element1,
    HASH_TABLE_ELEMENT hHash_table_element2);

Status hash_table_element_get_character_by_index(HASH_TABLE_ELEMENT hHash_table_element, int index, char* ch);

void hash_table_element_destroy(HASH_TABLE_ELEMENT* phHash_table_element);

#endif

HashTable.h

#ifndef HASH_TABLE_H
#define HASH_TABLE_H

#include "status.h"
typedef void* HASH_TABLE;

/* Precondition: none
   Postcondition: returns a handle to an empty hash table or NULL on Failure */
HASH_TABLE hash_table_init_default(unsigned initial_capacity);

/* Precondition: capacity is the capacity of the hash table.
   key is the key to be hased.
   Postcondition: returns an index in the hash table that comes from
   hasing the key with the hash table capacity */
unsigned hash_table_hash(unsigned capacity, char* key);

/* Precondition: hHash_table is a handle to a valid hash_table
   Postcondition: returns the capacity */
unsigned hash_table_get_capacity(HASH_TABLE hHash_table);

/* Precondition: hHash_table is a handle to a valid hash table. Key and data
   are the info to be put into the hash_table
   Postcondition: a new element has been created and inserted in the hash table
   Returns FAILURE for any memory allocation failure */
Status hash_table_insert(HASH_TABLE hHash_table, char* key, int data);

/* Precondition: hHash_table is a handle to a valid hash table object. Key is the
   key to search for.
   Postcondition: if the key exists, stores it in data and returns SUCCESS. Else,
   returns FAILURE and stores a 0 in data */
Status hash_table_get_data_by_key(HASH_TABLE hHash_table, char* key, int* data);

/* Precondition: hHash_table is a handle to a hash table. key is the key to be looked for.
   Postcondition: if the key exists, stores the index in indexOfKey and returns true. If it
   doesn't, returns false and stors a 0 in indexOfKey */
Boolean hash_table_get_key_index(HASH_TABLE hHash_table, char* key, unsigned* indexOfKey);

/* Precondition: hHash_table is a handle to a hash table. Index is the index to search.
   Data stores the data at the index.
   Postcondition: returns SUCCESS and stores the data value at that index in data. If the index
   caused overflow, or the index was NULL, returns FAILIURE and data is set to 0 */
Status hash_table_get_data_by_index(HASH_TABLE hHash_table, int index, int* data);

/* Precondition: hHash_table is a handle to a hash table. Index is the index to search.
   Data stores the data at the index.
   Postcondition: returns SUCCESS and stores the key at that index in key. If the index
   caused overflow, or the index was NULL, returns FAILIURE and key is set as the empty string */
Status hash_table_get_key_by_index(HASH_TABLE hHash_table, int index, char* key);

/* Precondition: hHash_table is a handle to a valid hash table object. Key is the
   key to be searched for
   Postcondition: if the element corresponding to the key exists, it is removed and
   SUCCESS is returned. Else, it FAILURE is returned */
Status hash_table_remove_element(HASH_TABLE hHash_table, char* key);

/* Precondition: phHash_table is a pointer to a handle to a hash table
   Postcondion: all memory associated with the hash table has been freed.
   and the hash table handle is set to NULL */
void hash_table_destroy(HASH_TABLE* phHash_table);

void debug(HASH_TABLE hHash_table);

#endif

HashTableElement.c

#include "HashTableElement.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


typedef struct hash_table_element {
    char* key;      
    int data;       
    unsigned capacity; // capacity of hash table during creation
} Hash_table_element;



HASH_TABLE_ELEMENT hash_table_element_create(char* key, int data) {
    Hash_table_element* pHash_table_element = (Hash_table_element*)malloc(sizeof(Hash_table_element));
    if (pHash_table_element != NULL) {
        pHash_table_element->key = (char*)malloc(sizeof(char) * (strlen(key) + 1));
        if (pHash_table_element->key == NULL) {
            free(pHash_table_element);
            return NULL;
        }
        for (unsigned i = 0; i < strlen(key); i++)
            pHash_table_element->key[i] = key[i];
        pHash_table_element->key[strlen(key)] = '\0';
        pHash_table_element->data = data;
    }
    return (HASH_TABLE_ELEMENT)pHash_table_element;
}


void hash_table_element_update(HASH_TABLE_ELEMENT hHash_table_element, int data) {
    Hash_table_element* pHash_table_element = (Hash_table_element*)hHash_table_element;
    pHash_table_element->data = data;
}


int hash_table_element_get_data(HASH_TABLE_ELEMENT hHash_table_element) {
    Hash_table_element* pHash_table_element = (Hash_table_element*)hHash_table_element;
    return pHash_table_element->data;
}


const char* hash_table_element_get_key(HASH_TABLE_ELEMENT hHash_table_element) {
    Hash_table_element* pHash_table_element = (Hash_table_element*)hHash_table_element;
    return (const char*)pHash_table_element->key;
}


Boolean hash_table_element_keys_match(HASH_TABLE_ELEMENT hHash_table_element1,
    HASH_TABLE_ELEMENT hHash_table_element2) {

    Hash_table_element* pHash_table_element1 = (Hash_table_element*)hHash_table_element1;
    Hash_table_element* pHash_table_element2 = (Hash_table_element*)hHash_table_element2;

    if (!strcmp(pHash_table_element1->key, pHash_table_element2->key))
        return TRUE;
    return FALSE;

}


Status hash_table_element_get_character_by_index(HASH_TABLE_ELEMENT hHash_table_element, int index, char* ch) {
    Hash_table_element* pHash_table_element = (Hash_table_element*)hHash_table_element;
    
    if (index > strlen(pHash_table_element->key)) {
        *ch = '\0';
        return FAILURE;
    }
    *ch = pHash_table_element->key[index];
    return SUCCESS;
}


void hash_table_element_destroy(HASH_TABLE_ELEMENT* phHash_table_element) {
    if (*phHash_table_element != NULL) {
        Hash_table_element* pHash_table_element = (Hash_table_element*)*phHash_table_element;
        free(pHash_table_element->key);
        free(pHash_table_element);
        *phHash_table_element = NULL;
    }
}

HashTable.c

#include "HashTable.h"
#include "HashTableElement.h"
#include <stdio.h>
#include <stdlib.h>
#include <string.h>


typedef struct hash_table {
    HASH_TABLE_ELEMENT* table;
    unsigned capacity;
} Hash_table;


HASH_TABLE hash_table_init_default(unsigned initial_capacity) {
    Hash_table* pHash_table = (Hash_table*)malloc(sizeof(Hash_table));
    if (pHash_table != NULL) {
        pHash_table->table = (HASH_TABLE_ELEMENT*)malloc(sizeof(HASH_TABLE_ELEMENT) * initial_capacity);
        if (pHash_table->table == NULL) {
            free(pHash_table);
            return NULL;
        }
        for (unsigned i = 0; i < initial_capacity; i++) {
            pHash_table->table[i] = NULL;
        }
        pHash_table->capacity = initial_capacity;
    }
    return (HASH_TABLE)pHash_table;
}


unsigned hash_table_hash(unsigned capacity, char* key) {
    unsigned sum = 0;
    for (unsigned i = 0; i < strlen(key); i++)
        sum += key[i];
    return sum % capacity;
}

unsigned hash_table_get_capacity(HASH_TABLE hHash_table) {
    Hash_table* pHash_table = (Hash_table*)hHash_table;
    return pHash_table->capacity;
}

Status hash_table_insert(HASH_TABLE hHash_table, char* key, int data) {
    Hash_table* pHash_table = (Hash_table*)hHash_table;
    unsigned index = hash_table_hash(pHash_table->capacity, key);

    unsigned quadraticNum = 1;
    Boolean overflow = (Boolean)(index >= pHash_table->capacity);
    while (!overflow && pHash_table->table[index] != NULL) {
        if (!strcmp(hash_table_element_get_key(pHash_table->table[index]), key)) {
            hash_table_element_update(pHash_table->table[index], data);
            return SUCCESS;
        }
        else {
            index += quadraticNum * quadraticNum;
            quadraticNum++;
            if (index >= pHash_table->capacity) {
                overflow = TRUE;
            }
        }
    }

    if (overflow) {
        unsigned newCapacity = index + 1;
        HASH_TABLE_ELEMENT* newTable = (HASH_TABLE_ELEMENT*)malloc(sizeof(HASH_TABLE_ELEMENT) * newCapacity);
        if (newTable == NULL)
            return FAILURE;
        for (unsigned i = 0; i < pHash_table->capacity; i++) {
            if (pHash_table->table[i] == NULL)
                newTable[i] = NULL;
            else {
                newTable[i] =
                    hash_table_element_create(hash_table_element_get_key(pHash_table->table[i]),
                        hash_table_element_get_data(pHash_table->table[i]));
                if (newTable[i] == NULL) {
                    for (int j = i - 1; j >= 0; j--)
                        hash_table_element_destroy(&(newTable[j]));
                    free(newTable);
                    return FAILURE;
                }
            }
        }

        for (unsigned i = pHash_table->capacity; i < newCapacity - 1; i++)
            newTable[i] = NULL;

        newTable[newCapacity - 1] = hash_table_element_create(key, data, pHash_table->capacity);
        if (newTable[newCapacity - 1] == NULL) {
            for (int i = newCapacity - 2; i >= 0; i--)
                hash_table_element_destroy(&(newTable[i]));
            free(newTable);
            return FAILURE;
        }

        for (unsigned i = 0; i < pHash_table->capacity; i++)
            hash_table_element_destroy(&(pHash_table->table[i]));
        free(pHash_table->table);
        pHash_table->table = newTable;
        pHash_table->capacity = newCapacity;
        return SUCCESS;
    }
    else {
        pHash_table->table[index] = hash_table_element_create(key, data, pHash_table->capacity);
        if (pHash_table->table[index] == NULL)
            return FAILURE;
        return SUCCESS;
    }
}

Boolean hash_table_get_key_index(HASH_TABLE hHash_table, char* key, unsigned* indexOfKey) {
    Hash_table* pHash_table = (Hash_table*)hHash_table;
    unsigned index = hash_table_hash(pHash_table->capacity, key);
    unsigned quadraticNum = 1;
    while (index < pHash_table->capacity) {
        if (pHash_table->table[index] != NULL) {
            if (!strcmp(key, hash_table_element_get_key(pHash_table->table[index]))) {
                *indexOfKey = index;
                return TRUE;
            }
        }
        index += quadraticNum * quadraticNum;
        quadraticNum++;
    }
    *indexOfKey = 0;
    return FALSE;
}

Status hash_table_get_data_by_key(HASH_TABLE hHash_table, char* key, int* data) {
    unsigned indexOfKey = 0;
    if (hash_table_get_key_index(hHash_table, key, &indexOfKey)) {
        Hash_table* pHash_table = (Hash_table*)hHash_table;
        *data = hash_table_element_get_data(pHash_table->table[indexOfKey]);
        return SUCCESS;
    }
    *data = 0;
    return FAILURE;
}

Status hash_table_get_data_by_index(HASH_TABLE hHash_table, int index, int* data) {
    Hash_table* pHash_table = (Hash_table*)hHash_table;
    if (index >= pHash_table->capacity || pHash_table->table[index] == NULL) {
        *data = 0;
        return FAILURE;
    }
    *data = hash_table_element_get_data(pHash_table->table[index]);
    return SUCCESS;
}


Status hash_table_get_key_by_index(HASH_TABLE hHash_table, int index, char* key) {
    Hash_table* pHash_table = (Hash_table*)hHash_table;
    if (index >= pHash_table->capacity || pHash_table->table[index] == NULL) {
        key[0] = '\0';
        return FAILURE;
    }

    char ch;
    for (unsigned i = 0; i < strlen(hash_table_element_get_key(pHash_table->table[index])); i++) {
        hash_table_element_get_character_by_index(pHash_table->table[index], i, &key[i]);
    }
    key[strlen(hash_table_element_get_key(pHash_table->table[index]))] = '\0';
    return SUCCESS;
}


Status hash_table_remove_element(HASH_TABLE hHash_table, char* key) {
    unsigned indexOfKey = 0;
    if (hash_table_get_key_index(hHash_table, key, &indexOfKey)) {
        Hash_table* pHash_table = (Hash_table*)hHash_table;
        hash_table_element_destroy(&(pHash_table->table[indexOfKey]));
        return SUCCESS;
    }
    return FAILURE;
}


void hash_table_destroy(HASH_TABLE* phHash_table) {
    Hash_table* pHash_table = (Hash_table*)*phHash_table;
    for (unsigned i = 0; i < pHash_table->capacity; i++)
        hash_table_element_destroy(&(pHash_table->table[i]));
    free(pHash_table->table);
    free(pHash_table);
    *phHash_table = NULL;
}

void debug(HASH_TABLE hHash_table) {
    Hash_table* pHash_table = (Hash_table*)hHash_table;
    int data;
    char key[100];
    char DNE[4] = "DNE";
    for (unsigned i = 0; i < pHash_table->capacity; i++) {
        printf("Index: %-10d", i);
        Status keyStatus = hash_table_get_key_by_index(hHash_table, i, key);
        Status dataStatus = hash_table_get_data_by_index(hHash_table, i, &data);
        if (keyStatus == FAILURE && dataStatus == FAILURE) {
            printf("Key: %-10sData: %-10s\n", DNE, DNE);
        }
        else {
            printf("Key: %-10sData: %-10d\n", key, data);
        }
    }
}

main.c

#include <stdio.h>
#include "HashTable.h"
#include <string.h>
#include <vld.h>

int main(int argc, char** argv) {

    HASH_TABLE hHash_table = hash_table_init_default(5);
    char key[3] = "A";
    unsigned num = 1;

    
    for (unsigned i = 0; i < 26; i++) {
        hash_table_insert(hHash_table, key, num);
        key[0] = key[0] + 1;
        num++;
    }

    debug(hHash_table);
    printf("\n\n\n");

    hash_table_remove_element(hHash_table, "A");
    debug(hHash_table);
    

    hash_table_destroy(&hHash_table);
    return 0;
}

Output

Visual Leak Detector read settings from: C:\Program Files (x86)\Visual Leak Detector\vld.ini
Visual Leak Detector Version 2.5.1 installed.
Index: 0         Key: A         Data: 1
Index: 1         Key: B         Data: 2
Index: 2         Key: C         Data: 3
Index: 3         Key: D         Data: 4
Index: 4         Key: E         Data: 5
Index: 5         Key: F         Data: 6
Index: 6         Key: G         Data: 7
Index: 7         Key: H         Data: 8
Index: 8         Key: S         Data: 19
Index: 9         Key: Q         Data: 17
Index: 10        Key: J         Data: 10
Index: 11        Key: K         Data: 11
Index: 12        Key: L         Data: 12
Index: 13        Key: M         Data: 13
Index: 14        Key: N         Data: 14
Index: 15        Key: I         Data: 9
Index: 16        Key: O         Data: 15
Index: 17        Key: P         Data: 16
Index: 18        Key: V         Data: 22
Index: 19        Key: W         Data: 23
Index: 20        Key: X         Data: 24
Index: 21        Key: Y         Data: 25
Index: 22        Key: Z         Data: 26
Index: 23        Key: T         Data: 20
Index: 24        Key: R         Data: 18
Index: 25        Key: DNE       Data: DNE
Index: 26        Key: DNE       Data: DNE
Index: 27        Key: DNE       Data: DNE
Index: 28        Key: DNE       Data: DNE
Index: 29        Key: DNE       Data: DNE
Index: 30        Key: DNE       Data: DNE
Index: 31        Key: DNE       Data: DNE
Index: 32        Key: DNE       Data: DNE
Index: 33        Key: DNE       Data: DNE
Index: 34        Key: DNE       Data: DNE
Index: 35        Key: DNE       Data: DNE
Index: 36        Key: DNE       Data: DNE
Index: 37        Key: DNE       Data: DNE
Index: 38        Key: DNE       Data: DNE
Index: 39        Key: DNE       Data: DNE
Index: 40        Key: U         Data: 21



Index: 0         Key: A         Data: 1
Index: 1         Key: B         Data: 2
Index: 2         Key: C         Data: 3
Index: 3         Key: D         Data: 4
Index: 4         Key: E         Data: 5
Index: 5         Key: F         Data: 6
Index: 6         Key: G         Data: 7
Index: 7         Key: H         Data: 8
Index: 8         Key: S         Data: 19
Index: 9         Key: Q         Data: 17
Index: 10        Key: J         Data: 10
Index: 11        Key: K         Data: 11
Index: 12        Key: L         Data: 12
Index: 13        Key: M         Data: 13
Index: 14        Key: N         Data: 14
Index: 15        Key: I         Data: 9
Index: 16        Key: O         Data: 15
Index: 17        Key: P         Data: 16
Index: 18        Key: V         Data: 22
Index: 19        Key: W         Data: 23
Index: 20        Key: X         Data: 24
Index: 21        Key: Y         Data: 25
Index: 22        Key: Z         Data: 26
Index: 23        Key: T         Data: 20
Index: 24        Key: R         Data: 18
Index: 25        Key: DNE       Data: DNE
Index: 26        Key: DNE       Data: DNE
Index: 27        Key: DNE       Data: DNE
Index: 28        Key: DNE       Data: DNE
Index: 29        Key: DNE       Data: DNE
Index: 30        Key: DNE       Data: DNE
Index: 31        Key: DNE       Data: DNE
Index: 32        Key: DNE       Data: DNE
Index: 33        Key: DNE       Data: DNE
Index: 34        Key: DNE       Data: DNE
Index: 35        Key: DNE       Data: DNE
Index: 36        Key: DNE       Data: DNE
Index: 37        Key: DNE       Data: DNE
Index: 38        Key: DNE       Data: DNE
Index: 39        Key: DNE       Data: DNE
Index: 40        Key: U         Data: 21
No memory leaks detected.
Visual Leak Detector is now exiting.

C:\UML\Computer Science\COMP.1020 Computing II\Interfaces\Hash Table ADT\No Duplicates\Hash Table ADT\Debug\Hash Table ADT.exe (process 24304) exited with code 0.
Press any key to close this window . . .
Ben
  • 195
  • 7
  • Please _edit_ your question and post the code you've written. I'm not sure I understand what you mean: _I made a crucial error in that my hashing function was based on the capacity of the hash table._ Normally, the hash function just takes a buffer pointer and length. It doesn't care how many hash buckets there are or whether the number of hash buckets is dynamically changeable. – Craig Estey Jan 20 '21 at 20:48
  • I edited my comments and added the code. I hope I've made it clear what I'm asking, if not let me know. – Ben Jan 20 '21 at 21:02
  • You have `unsigned capacity;` and you `return sum % capacity;` from your hash function, so regardless of the table size, as long as you have updated `capacity` upon expanding the table size -- all will work seamlessly. – David C. Rankin Jan 20 '21 at 21:07
  • Yes, but right now it's not working. For example, when I originally inserted "A", capacity was 5 and A is 65 on the ASCII table so 65 % 5 = index 0. But, when I removed A, the capacity had changed to 41. 41 % 5 is 1. so when I go to remove A, it starts looking at index 1 and quadratically probes until it's gone past the end of the table and never finds A. You'll see how this plays out in my hash_table_remove_element function which calls hash_table_get_key_index to find the index. It's this index finding function that goes to the wrong index. The only thing I can think of is to rehash – Ben Jan 20 '21 at 21:17
  • if you compare my main.c program to the output, I expected to see "A" removed on the second round of output but it's still there because of this. – Ben Jan 20 '21 at 21:18
  • What is [missing] `vld.h`? – Craig Estey Jan 20 '21 at 21:41
  • If you keep the raw hash value (the `sum` in `hash_table_hash`) with the element, then it's quick to compute the new hash value. I'm not aware of any way to avoid rehashing when you resize. btw, I suggest looking at [Pearson hashing](https://en.wikipedia.org/wiki/Pearson_hashing) for a simple, but effective hash. – user3386109 Jan 21 '21 at 00:18

2 Answers2

1

Caveat: This isn't so much an algorithm change as it is style changes. That's because the style used, obscured a lot of the algorithm, due to the verbosity of the style itself.

Ordinarily, this question would be better on codereview. But, you also felt you had bugs. Because of the style, it could easily paper over bugs.

I refactored your code. And, I ran it. There were no memory leaks, so I'm not sure what the issue is.


You defined (e.g.):

typedef void *HASH_TABLE_ENTRY;

And, you use it everywhere

Then, in a given function you cast it to:

Hash_table_element *pHash_table_element = (Hash_table_element *) hHash_table_element;

That is massively type unsafe. It can cover up a host of subtle bugs.

This is, partly, because you put the actual struct definition in the .c for the code.

Just put the real struct definition in the .h and get rid of all the casting.

Also, using a typedef for a pointer type is considered a "code smell" by some developers.


You do not need an opaque "handle". And, even if you did, this wouldn't be the way to do it. [Do not do this, but ...], to create a typesafe handle, you'd want (e.g.):

typedef struct {
    void *hte_handle;
} *HTE_HANDLE;

A good style rule is to use short names for function arguments and function scope variables. And, the variable name doesn't have to replicate the type in its name. Replace (e.g.):

Hash_table_element *pHash_table_element;

With:

Hash_table_element *hte;

And, the typedef names are a bit long. Consider replacing (e.g.):

typedef struct { ... } Hash_table_element;

With:

typedef struct { ... } hte_t;

Likewise for long function names. Prefixing everything with (e.g):

hash_table_element_create(char *key, int data)

Instead of:

hte_create(char *key, int data)

You were doing:

for (i = 0;  i < strlen(key); ++i)

This is very slow. It increases the run time from O(n) to O(n^2). Better to do:

size_t keylen = strlen(key);
for (i = 0; i < keylen; ++i)

Here's a concatenated refactoring of your code. Because of the files involved, I created a single concatenated file. It has perl code at the front to automatically extract the files. Or, after the __DATA__ line, each file is prefixed by: % filename

#!/usr/bin/perl
# tbin/ovrcat.pm -- archive extractor

ovrcat(@ARGV);
exit(0);

sub ovrcat
{
    my($xfsrc,$bf);
    my($file,$xfcur);

    $pgmtail = "ovrcat";

    $xfsrc = "ovrcat::DATA";
    $xfsrc = \*$xfsrc;

    while ($bf = <$xfsrc>) {
        chomp($bf);

        if ($bf =~ /^%\s+(.+)$/) {
            setofile($1);
            next;
        }

        print($xfdst $bf,"\n")
            if (ref($xfdst));
    }

    while (($file,$xfcur) = each(%lookup)) {
        close($xfcur);
    }
}

sub setofile
{
    my($ofile) = @_;
    my($xfcur);

    {
        $xfdst = $lookup{$ofile};
        last if (ref($xfdst));

        printf("$pgmtail: extracting %s ...\n",$ofile);

        open($xfcur,">$ofile") or
            die("ovrcat: unable to open '$ofile' -- $!\n");

        $lookup{$ofile} = $xfcur;
        $xfdst = $xfcur;
    }
}

package ovrcat;
1;
__DATA__
% htable.h
#ifndef HASH_TABLE_H
#define HASH_TABLE_H

#include <status.h>
#include <hte.h>

typedef struct hash_table {
    hte_t **table;
    unsigned capacity;
} hashtable_t;

#if 0
/* Precondition: none
   Postcondition: returns a handle to an empty hash table or NULL on Failure */
hash_table_t *hash_table_init_default(unsigned initial_capacity);

/* Precondition: capacity is the capacity of the hash table.
   key is the key to be hased.
   Postcondition: returns an index in the hash table that comes from
   hasing the key with the hash table capacity */
unsigned hash_table_hash(unsigned capacity, char *key);

/* Precondition: hHash_table is a handle to a valid hashtable_t
   Postcondition: returns the capacity */
unsigned hash_table_get_capacity(hashtable_t *table);

/* Precondition: hHash_table is a handle to a valid hash table. Key and data
   are the info to be put into the hash_table
   Postcondition: a new element has been created and inserted in the hash table
   Returns FAILURE for any memory allocation failure */
Status hash_table_insert(hashtable_t *hHash_table, char *key, int data);

/* Precondition: hHash_table is a handle to a valid hash table object. Key is the
   key to search for.
   Postcondition: if the key exists, stores it in data and returns SUCCESS. Else,
   returns FAILURE and stores a 0 in data */
Status hash_table_get_data_by_key(hashtable_t *hHash_table, char *key, int *data);

/* Precondition: hHash_table is a handle to a hash table. key is the key to be looked for.
   Postcondition: if the key exists, stores the index in indexOfKey and returns true. If it
   doesn't, returns false and stors a 0 in indexOfKey */
Boolean hash_table_get_key_index(hashtable_t *hHash_table, char *key, unsigned *indexOfKey);

/* Precondition: hHash_table is a handle to a hash table. Index is the index to search.
   Data stores the data at the index.
   Postcondition: returns SUCCESS and stores the data value at that index in data. If the index
   caused overflow, or the index was NULL, returns FAILIURE and data is set to 0 */
Status hash_table_get_data_by_index(hashtable_t *hHash_table, int index, int *data);

/* Precondition: hHash_table is a handle to a hash table. Index is the index to search.
   Data stores the data at the index.
   Postcondition: returns SUCCESS and stores the key at that index in key. If the index
   caused overflow, or the index was NULL, returns FAILIURE and key is set as the empty string */
Status hash_table_get_key_by_index(hashtable_t *hHash_table, int index, char *key);

/* Precondition: hHash_table is a handle to a valid hash table object. Key is the
   key to be searched for
   Postcondition: if the element corresponding to the key exists, it is removed and
   SUCCESS is returned. Else, it FAILURE is returned */
Status hash_table_remove_element(hashtable_t *hHash_table, char *key);

/* Precondition: phHash_table is a pointer to a handle to a hash table
   Postcondion: all memory associated with the hash table has been freed.
   and the hash table handle is set to NULL */
void hash_table_destroy(hashtable_t ** phHash_table);

void debug(hashtable_t *hHash_table);
#endif

#include <htable.proto>

#endif
% hte.h
#ifndef HTE_H
#define HTE_H

#include <status.h>

typedef struct hte {
    char *key;
    int data;
    unsigned capacity;              // capacity of hash table during creation
} hte_t;

#if 0
typedef void *HASH_TABLE_ELEMENT;

/*Precondition: none
  Postcondition: returns a handle to a new hash table element. Else returns NULL */
HASH_TABLE_ELEMENT
hte_create(char *key, int data);

/*Precondition: hHash_table_element is a handle to a valid hash table element, data is the
  new data value.
  Postcondition: the data inside the hash table has been updated. */
void
hte_update(HASH_TABLE_ELEMENT hHash_table_element, int data);

/*Precondition: hHash_table_element is a handle to a valid hash table element.
  Postcondition: returns the data value. */
#if 0
int
hte_get_data(HASH_TABLE_ELEMENT hHash_table_element);
#else
int
hte_get_data(const HASH_TABLE_ELEMENT hHash_table_element);
#endif

/*Precondition: hHash_table_element is a handle to a valid hash table element.
  Postcondition: returns the key */
const char *
hte_get_key(const hte_t *hte);

/*Precondition: hHash_table_element1 and 2 are handles to valid hash table elements.
  Postcondition: returns true or false if the keys match or not*/
Boolean
hte_keys_match(HASH_TABLE_ELEMENT hHash_table_element1,
    HASH_TABLE_ELEMENT hHash_table_element2);

Status
hte_get_character_by_index(HASH_TABLE_ELEMENT hHash_table_element, int index, char *ch);

void hte_destroy(HASH_TABLE_ELEMENT * phHash_table_element);
#endif

#include <hte.proto>

#endif
% status.h
#ifndef STATUS_H
#define STATUS_H

typedef enum status { FAILURE, SUCCESS } Status;
typedef enum boolean { FALSE, TRUE } Boolean;

#endif
% htable.c
#include <htable.h>
#include <hte.h>

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

hashtable_t *
htable_init_default(unsigned initial_capacity)
{
    hashtable_t *htab = malloc(sizeof(*htab));

    if (htab != NULL) {
        htab->table = malloc(sizeof(*htab->table) * initial_capacity);
        if (htab->table == NULL) {
            free(htab);
            return NULL;
        }
        for (unsigned i = 0; i < initial_capacity; i++) {
            htab->table[i] = NULL;
        }
        htab->capacity = initial_capacity;
    }
    return htab;
}

unsigned
htable_hash(unsigned capacity, const char *key)
{
    unsigned sum = 0;

    size_t keylen = strlen(key);

    for (unsigned i = 0; i < keylen; i++)
        sum += key[i];

    return sum % capacity;
}

unsigned
htable_get_capacity(const hashtable_t *htab)
{

    return htab->capacity;
}

Status
htable_insert(hashtable_t *htab, char *key, int data)
{
    unsigned index = htable_hash(htab->capacity, key);

    unsigned quadraticNum = 1;
    Boolean overflow = (Boolean) (index >= htab->capacity);

    while (!overflow && htab->table[index] != NULL) {
        if (strcmp(hte_get_key(htab->table[index]),key) == 0) {
            hte_update(htab->table[index], data);
            return SUCCESS;
        }
        else {
            index += quadraticNum * quadraticNum;
            quadraticNum++;
            if (index >= htab->capacity) {
                overflow = TRUE;
            }
        }
    }

    if (overflow) {
        unsigned newCapacity = index + 1;

        hte_t **newTable = malloc(sizeof(*newTable) * newCapacity);
        if (newTable == NULL)
            return FAILURE;

        for (unsigned i = 0; i < htab->capacity; i++) {
            hte_t *htefrom = htab->table[i];
            if (htefrom == NULL) {
                newTable[i] = NULL;
                continue;
            }

            newTable[i] = hte_create(hte_get_key(htefrom),
                hte_get_data(htefrom));

            if (newTable[i] == NULL) {
                for (int j = i - 1; j >= 0; j--)
                    hte_destroy(&newTable[j]);
                free(newTable);
                return FAILURE;
            }
        }

        for (unsigned i = htab->capacity; i < newCapacity - 1; i++)
            newTable[i] = NULL;

#if 0
        newTable[newCapacity - 1] = hte_create(key, data, htab->capacity);
#else
        newTable[newCapacity - 1] = hte_create(key, data);
#endif
        if (newTable[newCapacity - 1] == NULL) {
            for (int i = newCapacity - 2; i >= 0; i--)
                hte_destroy(&newTable[i]);
            free(newTable);
            return FAILURE;
        }

        for (unsigned i = 0; i < htab->capacity; i++)
            hte_destroy(&htab->table[i]);
        free(htab->table);

        htab->table = newTable;
        htab->capacity = newCapacity;

        return SUCCESS;
    }
    else {
#if 0
        htab->table[index] = hte_create(key, data, htab->capacity);
#else
        htab->table[index] = hte_create(key, data);
#endif
        if (htab->table[index] == NULL)
            return FAILURE;
        return SUCCESS;
    }
}

Boolean
htable_get_key_index(hashtable_t *htab, const char *key, unsigned *indexOfKey)
{
    unsigned index = htable_hash(htab->capacity, key);
    unsigned quadraticNum = 1;

    while (index < htab->capacity) {
        if (htab->table[index] != NULL) {
            if (! strcmp(key, hte_get_key(htab->table[index]))) {
                *indexOfKey = index;
                return TRUE;
            }
        }
        index += quadraticNum * quadraticNum;
        quadraticNum++;
    }

    *indexOfKey = 0;

    return FALSE;
}

Status
htable_get_data_by_key(hashtable_t *htab, char *key, int *data)
{
    unsigned indexOfKey = 0;

    if (htable_get_key_index(htab, key, &indexOfKey)) {
        *data = hte_get_data(htab->table[indexOfKey]);
        return SUCCESS;
    }

    *data = 0;

    return FAILURE;
}

Status
htable_get_data_by_index(hashtable_t *htab, int index, int *data)
{

    if (index >= htab->capacity || htab->table[index] == NULL) {
        *data = 0;
        return FAILURE;
    }

    *data = hte_get_data(htab->table[index]);

    return SUCCESS;
}

Status
htable_get_key_by_index(hashtable_t *htab, int index, char *key)
{

    if (index >= htab->capacity || htab->table[index] == NULL) {
        key[0] = '\0';
        return FAILURE;
    }

    //char ch;

    size_t keylen = strlen(hte_get_key(htab->table[index]));
    for (unsigned i = 0; i < keylen; i++) {
        hte_get_character_by_index(htab->table[index], i, &key[i]);
    }

    key[keylen] = 0;

    return SUCCESS;
}

Status
htable_remove_element(hashtable_t *htab, const char *key)
{
    unsigned indexOfKey = 0;

    if (htable_get_key_index(htab, key, &indexOfKey)) {
        hte_destroy(&htab->table[indexOfKey]);
        return SUCCESS;
    }

    return FAILURE;
}

void
htable_destroy(hashtable_t **phtab)
{
    hashtable_t *htab = *phtab;

    for (unsigned i = 0; i < htab->capacity; i++)
        hte_destroy(&htab->table[i]);

    free(htab->table);
    free(htab);

    *phtab = NULL;
}

void
debug(hashtable_t *htab)
{
    int data;
    char key[100];
    char DNE[4] = "DNE";

    for (unsigned i = 0; i < htab->capacity; i++) {
        printf("Index: %-10d", i);
        Status keyStatus = htable_get_key_by_index(htab, i, key);
        Status dataStatus = htable_get_data_by_index(htab, i, &data);

        if (keyStatus == FAILURE && dataStatus == FAILURE) {
            printf("Key: %-10sData: %-10s\n", DNE, DNE);
        }
        else {
            printf("Key: %-10sData: %-10d\n", key, data);
        }
    }
}
% hte.c
#include <hte.h>

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

hte_t *
hte_create(const char *key, int data)
{
    hte_t *hte = malloc(sizeof(*hte));
    //size_t keylen = strlen(key);

    if (hte != NULL) {
        hte->key = strdup(key);

        if (hte->key == NULL) {
            free(hte);
            return NULL;
        }

        hte->data = data;
    }

    return hte;
}

void
hte_update(hte_t *hte, int data)
{

    hte->data = data;
}

int
hte_get_data(const hte_t *hte)
{

    return hte->data;
}

const char *
hte_get_key(const hte_t *hte)
{

    return (const char *) hte->key;
}

Boolean
hte_keys_match(const hte_t *hte1, const hte_t *hte2)
{

    if (! strcmp(hte1->key, hte2->key))
        return TRUE;

    return FALSE;

}

Status
hte_get_character_by_index(hte_t *hte, int index, char *ch)
{

    if (index > strlen(hte->key)) {
        *ch = '\0';
        return FAILURE;
    }

    *ch = hte->key[index];

    return SUCCESS;
}

void
hte_destroy(hte_t **phte)
{
    hte_t *hte = *phte;

    if (hte != NULL) {

        free(hte->key);
        free(hte);

        *phte = NULL;
    }
}
% quadhash.c
#include <stdio.h>
#include <htable.h>
#include <string.h>
#if 0
#include <vld.h>
#endif

int
main(int argc, char **argv)
{

    hashtable_t *htab = htable_init_default(5);
    char key[3] = "A";
    unsigned num = 1;

    for (unsigned i = 0; i < 26; i++) {
        htable_insert(htab, key, num);
        key[0] = key[0] + 1;
        num++;
    }

    debug(htab);
    printf("\n\n\n");

    htable_remove_element(htab, "A");
    debug(htab);

    htable_destroy(&htab);

    return 0;
}
% htable.proto
// htable.proto -- prototypes

    hashtable_t *
    htable_init_default(unsigned initial_capacity);

    unsigned
    htable_hash(unsigned capacity, const char *key);

    unsigned
    htable_get_capacity(const hashtable_t *htab);

    Status
    htable_insert(hashtable_t *htab, char *key, int data);

    Boolean
    htable_get_key_index(hashtable_t *htab, const char *key, unsigned *indexOfKey);

    Status
    htable_get_data_by_key(hashtable_t *htab, char *key, int *data);

    Status
    htable_get_data_by_index(hashtable_t *htab, int index, int *data);

    Status
    htable_get_key_by_index(hashtable_t *htab, int index, char *key);

    Status
    htable_remove_element(hashtable_t *htab, const char *key);

    void
    htable_destroy(hashtable_t **phtab);

    void
    debug(hashtable_t *htab);
% hte.proto
// hte.proto -- prototypes

    hte_t *
    hte_create(const char *key, int data);

    void
    hte_update(hte_t *hte, int data);

    int
    hte_get_data(const hte_t *hte);

    const char *
    hte_get_key(const hte_t *hte);

    Boolean
    hte_keys_match(const hte_t *hte1, const hte_t *hte2);

    Status
    hte_get_character_by_index(hte_t *hte, int index, char *ch);

    void
    hte_destroy(hte_t **phte);
Craig Estey
  • 30,627
  • 4
  • 24
  • 48
  • Okay, the huge effort is worth the nod, even though the perl extraction (which is cool) borked all the header include quotes, e.g. `#include ` instead of `#include "htable.h"` - good job. Other note, POSIX is needed for `strdup()`, but that should be available for just about all. Tiny nit on `-Wsign-compare`, but that won't impact the code here. – David C. Rankin Jan 21 '21 at 00:59
  • @DavidC.Rankin I have a script that normally does the archive, but it's geared for pastebin.com. I did a slimmed down quick version [partly because there have been a few other SO questions where there were large numbers of separate files]. To do the refactor, I had to plug the code into _my_ developement IDE(?) and then "unplug" after the changes. For the `#include`, I've never liked the quoted version--I just add `-I.` when compiling. For example, the `*.proto` files are auto-generated by my IDE. – Craig Estey Jan 21 '21 at 01:05
  • I thought it was cool either way. The clever use of perl as an impromptu `tar` to bundle of a set of source so they can be easily extracted was a nice touch! – David C. Rankin Jan 21 '21 at 01:39
  • Thanks for all you suggestions Craig. In regards to your style recommendations, I agree with a lot of them (especially that the names are too long). In regards to the opaque object design pattern I used, that's just the way I was taught in class which is why I did it that way. I was taught to put the struct definition in the .c file. That way, main.c only knows about the "typedef void*" part and it can never directly access data in the object using the -> operator which could lead to unwanted behavior. Based on what you said, it seems like that's not the industry standard though. – Ben Jan 21 '21 at 20:15
  • I hate to say it, but the opaque pointer thing is _rarely_ used (i.e. you were taught _wrong_ ;-) Just curious, _which_ school/class???). Look at all the extra stuff you had to do to _make_ it work. If you must, look at GTK or GMP libraries. – Craig Estey Jan 21 '21 at 22:11
  • But, _never_ pass around `void *` pointers. You could pass a `foo **` to a function that wants `foo *` [or vice versa] and the compiler wouldn't be able to check. You could spend hours/days/weeks debugging it (vs. the compiler flagging it in a heartbeat). You could have a pgm run for weeks w/o error until it hits the bad [little used] code and it could have UB that just tweaks the wrong variable and continues running. The program dies a day later ... [I've had to find such bugs in _real_ production code]. – Craig Estey Jan 21 '21 at 22:14
  • I was taught this method in my data structures class. I was told that the opaque handle method prevents unwanted behavior. Using this method, main can only manipulate objects with the functions written for it, and can never directly access the data in the objects with the -> operator. So, lets say I had a Date object (for day, month, year). In main, if I had "DATE hDate" where DATE is void*, I couldn't do something like "hDate->day = -5" which obviously I don't want. So, by preventing main from being able to directly access data, and by having main only be able to use functions to... – Ben Jan 22 '21 at 05:54
  • ...manipulate an object (and the functions are written to prevent data from getting set incorrectly i.e. if I had a setDate() function and I passed in -5 to set the day, I would flag an error and not change the object), this prevents data from ever getting set improperly. I always thought it was a very cool way of programming in C, but it's interesting to hear you say it's rarely used. My only guess is maybe they teach it this way since the class right after that we learn object oriented programming and this method kind of mimics object oriented programming in C to a certain extent. – Ben Jan 22 '21 at 05:56
  • So, just like in an object oriented language like C++ for example where data can be marked private and main can only call the public functions for the class, in this C opaque object method the structure is defined in the .c file so it's kept private from main and main can only call those functions declared in the .h file. – Ben Jan 22 '21 at 06:02
0

I've read something about when I resize the hash table, I just have to rehash all of the elements currently in the hash table with the size of the new hash table.

Yes, that's commonly done.

I was just wondering if there's another way to do it other than that.

Yes, there are other ways. One way is to keep the old and new tables around, then for a while you have to search/erase in both, but you can spread out the resizing cost (for more predictable operational latency), by "migrating" an element from the old table to the new table whenever it's accessed (as you'd have done the work to rehash the key anyway), or perhaps migrate an old-table element each time a insertion is done in the new table.

Another similar idea is to realloc to a larger array of buckets and not reposition the element - then when doing lookups/erase/inserts, you check while modding by both the new and old bucket counts (in whatever order), migrating elements to their new optimal bucket based on the current bucket_count on the fly.

Another approach is to store the hash values alongside the key[/value] entries in the table, so you can resize and reposition the keys in their new bucket positions all at once, but don't need to recalculate the hash values for all keys.

Tony Delroy
  • 102,968
  • 15
  • 177
  • 252