2

So I know it is shameless to ask someone on the internet to debug my code, but I have drained my pea-sized brain to the limit and I still can't solve this (my rubber duck has fled my desk in order to get some peace). All joking aside, I am having trouble with the pset5: speller, from cs50. Thing is, that after finishing all the tedious coding process and finally being able to compile my code free of errors I am, of course, getting the annoying segfault.

Now for the "interesting and fun" part: when executing the check50 function that is provided by the cs50 teaching staff I get all the green ticks, as if my code is working... which is very confusing.

Here is my code:

// Implements a dictionary's functionality

#include <ctype.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <strings.h>

#include "dictionary.h"

// Represents a node in a hash table
typedef struct node
{
    char word[LENGTH + 1];
    struct node *next;
} node;

// Choose number of buckets in hash table
const unsigned int N = 150001; //Bigger than word count for enough buckets (Not concerned about memory space)

// Hash table
node *table[N];

// Variable with the amount of words in the dictionary
int count = 0;

// Returns true if word is in dictionary, else false
bool check(const char *word)
{
    // hash word for value in table
    int hash_value = hash(word);

    // access list at hash value and compare words (strcompare)
    node *p = table[hash_value];

    // loop to end of linked list
    while (p != NULL)
    {
        if (strcasecmp(word, p->word) == 0)
        {
            return true;
        }
        p = p->next;
    }
    return false;
}

// Hashes word to a number
/*CREDIT: JR(joseph28robinson) from website medium.com for helping with HASH TABLE theory*/
unsigned int hash(const char *word)
{
    long x = 0;

    // Improve this hash function
    for (int i = 0, n = strlen(word); i < n; i++)
    {
        // I am unsure if the subtraction of 'A' is needed
        x += toupper(word[i]);
    }

    return x % N;
}

// Loads dictionary into memory, returning true if successful, else false
bool load(const char *dictionary)
{
    // Open file
    FILE *dict_file = fopen(dictionary, "r");
    if (dictionary == NULL)
    {
        // Could not open dictionary
        return false;
    }

    // Buffer for reading file
    char new_word[LENGTH + 1];

    // Scan file for every word and create a new node for each one
    // (NOT SURE WHY ==1 instead of != EOF /*CREDIT: Creig Estey comment from stackoverflow.com*/)
    while (fscanf(dict_file, "%s", new_word) == 1)
    {
        // Get word's hash value
        int hash_value = hash(new_word);

        // Malloc space for node
        node *p = malloc(sizeof(node));
        if (p == NULL)
        {
            return false;
        }

        // Fill in new node's variables
        strcpy(p->word, new_word);

        // TODO: Check for available space in hash's table hash_value node or create linked list
        if (table[hash_value] == NULL)
        {
            // First item in bucket so pointer to NULL
            p->next = NULL;
        }
        else
        {
            // Not first item in bucket so pointer to first item in list (LINKED LIST THEORY)
            p->next = table[hash_value];
        }
        // Point bucket to new node
        table[hash_value] = p;

        // Update size of dict
        count++;
    }

    // Close file
    fclose(dict_file);

    return true;
}

// Returns number of words in dictionary if loaded, else 0 if not yet loaded
unsigned int size(void)
{
    // No need to insert if function to check if dict loaded since count is already set to 0 (it will return 0 if not loaded)
    return count;
}

// Unloads dictionary from memory, returning true if successful, else false
bool unload(void)
{
    // Iterate through all buckets of hash table
    for (int i = 0; i < N; i++)
    {
        // Access hash's table bucket
        node *p = table[i];

        // Loop through all items (linked list) in bucket
        while (p != NULL)
        {
            // Use trav pointer not to orphan list
            node *trav = p;
            // Point to next element in list
            p = p->next;
            // Free trav
            free(trav);
        }

        // End of loop condition to return true
        if (p == NULL && i == N - 1)
        {
            return true;
        }
    }
    return false;
}

I have tried using the debugger and I have checked every possible (to my knowledge) NULL return when working with memory. I am assuming that therein lies the problem, although at this point I am not so sure.

Gianluca
  • 33
  • 3
  • 2
    Run your code through valgrind. If you're mismanaging memory it will tell you where. – dbush Jun 01 '23 at 12:39
  • 2
    You should run your program in a debugger. It should immediately show you *where* the segfault happens. Then you can inspect the involved variables and start searching *why* it happens. – Gerhardh Jun 01 '23 at 12:46
  • 1
    post a small dict file you used for test so we can just reproduce your scenario – arfneto Jun 01 '23 at 12:52
  • 2
    `hash` function returns `unsigned int`, but you assign it to `int` (in `check` function) what may cause problems, since you're using it as and index of an array – Nierusek Jun 01 '23 at 12:56
  • where are `strings.h` and `dictionary.h`? – arfneto Jun 01 '23 at 13:02
  • 1
    Unrelated: That check `if (table[hash_value] == NULL)` in `load` function is useless. You can just do `p->next = table[hash_value];` in any case. – Gerhardh Jun 01 '23 at 13:37
  • 1
    That check `if (p == NULL && i == N - 1)` in `unload` function is also not required. That condition is identical with just leaving the outer loop. You can just move that `return` after the loop. – Gerhardh Jun 01 '23 at 13:38
  • Thank you everyone for the suggestions! I actually kept running it through the debugger, applied the few suggested changes in the code, and it ended up working. Still unsure of why I ran into seg-fault problem... but, tbh, I am glad it's over. – Gianluca Jun 02 '23 at 10:58
  • " Still unsure of why I ran into seg-fault problem... but, tbh, I am glad it's over." -- It's (likely) _not_ over. This is "programming by coincidence", and you should _not_ do that if you want to succeed in CS field. https://daedtech.com/high-stakes-programming-coincidence – Employed Russian Jun 03 '23 at 18:30

1 Answers1

0

There ois a potential problem in the code when reading the dictionary file:

 while (fscanf(dict_file, "%s", new_word) == 1)

If there is a word in the dictionary that is longer than LENGTH characters, fscanf() will cause a buffer overflow trying to store it into new_word.

You should pass the maximum number of characters to fscanf() or use a custom function to read a word:

#include <ctype.h>

char *read_word(char *buf, size_t size, FILE *fp) {
    size_t i;
    int c;
    while ((c = getc(fp)) != EOF) {
        if (isspace(c)) {
            if (i == 0)
                continue;
            else
                break;
        } else {
            if (i + 1 < size)
                buf[i++] = (char)c;
        }
    }
    if (i == 0)
        return NULL;
    buf[i] = '\0';
    return buf;
}

Then you would replace the while loop with:

while (get_word(new_word, sizeof new_word, dict_file))

The hash function can be improved: char values must be cast as (unsigned char) when passed to toupper() because this function is only defined for positive values and EOF and char may be signed on the target platform. Furthermore, hash should be unsigned to avoid overflow issues and negative result from the modulo operation:

unsigned int hash(const char *word)
{
    unsigned long x = 0;

    // Improve this hash function
    for (size_t i = 0; word[i] != '\0'; i++) {
        // subtraction of 'A' is not needed
        x += toupper((unsigned char)word[i]);
    }
    return x % N;
}
chqrlie
  • 131,814
  • 10
  • 121
  • 189