3

I'm very new to programming and I've been trying to finish the cs50 course without just copying other people's code and trying to understand why mine won't work. I'm currently stuck in pset5 (I have been for a couple of days now), the code compiles ok, but not works and valgrind returns:

Process terminating with default action of signal 11 (SIGSEGV)
==1697==  Access not within mapped region at address 0x0
==1697==    at 0x401A86: hash (dictionary.c:53)
==1697==    by 0x401B71: load (dictionary.c:78)
==1697==    by 0x4012BE: main (speller.c:40)

here's my code:

// Implements a dictionary's functionality

#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <strings.h>
#include "dictionary.h"

/*
** Global variable that indicates the number of words loaded in the dictionary
*/
unsigned int loadedWords = 0;

// Represents a node in a hash table
typedef struct node
{
    char word[LENGTH + 1];
    struct node *next;
}
node;

// Number of buckets in hash table
const unsigned int N = 54;

// Hash table
node *table[N];

// Returns true if word is in dictionary, else false
bool check(const char *word)
{
    if (table[hash(word)]->next == NULL)
    {
        return false;
    }
    else if (strcasecmp(word, table[hash(word)]->word) == 0)
    {
        return true;
    }
    else
    {
        return check(table[hash(word)]->next->word);
    }
}

// Hashes word to a number
unsigned int hash(const char *word)
{
    char lowerCaseWord[LENGTH + 1];
    for (int i = 0; word[i]; i++)
    {
        lowerCaseWord[i] = tolower(word[i]);
    }
    unsigned int hashNumber = ((int)word[0] + (int)word[1]) % 53;
    return hashNumber;
}

// Loads dictionary into memory, returning true if successful, else false
bool load(const char *dictionary)
{
    FILE *file = fopen(dictionary, "r");
    if (file == NULL)
    {
        printf("Could not open\n");
        return false;
    }
    char word[LENGTH + 1];
    while (fscanf(file, "%s", word) != EOF)
    {
        node *buffer = malloc(sizeof(node));
        if (buffer == NULL)
        {
            return false;
        }
        strcpy(buffer->word, word);
        buffer->next = table[hash(word)]->next;
        table[hash(word)]->next = buffer;
        loadedWords ++;
    }
    fclose(file);
    return true;
}

// Returns number of words in dictionary if loaded, else 0 if not yet loaded
unsigned int size(void)
{
    return loadedWords;
}

// Unloads dictionary from memory, returning true if successful, else false
bool unload(void)
{
    for (int i = 0; i < N - 1; i++)
    {
        while (table[i] != NULL)
        {
            node *tmp = table[i]->next;
            free(table[i]);
            table[i] = tmp;
        }
    }
    return true;
}

Could someone please point out why this won't work and/or how to fix this? I'm not looking for the complete solution to the problem, I'd just like to know what my error is or how my logic is wrong. Thanks.

dani1995ar
  • 39
  • 2
  • 2
    I think you have to define a `main` function – Noman Gul Feb 05 '21 at 02:37
  • 1
    *I'd just like to know what my error is*. The way to find that out is to debug the code. Use a debugger to step through the code line by line as well as examine the state at the point the program crashes. – kaylum Feb 05 '21 at 02:37
  • 1
    On a side note, if `malloc` fails in `load`, you'll leak `file`. – Daniel Walker Feb 05 '21 at 02:43
  • Also, in the line `unsigned int hashNumber = ((int)word[0] + (int)word[1]) % 53;`, that should be `lowerCaseWord`. – Daniel Walker Feb 05 '21 at 02:45
  • Anyway, the error is informing you that you're dereferencing a `NULL` pointer which is causing you to segfault. However, the only pointer you're dereferencing in `hash` is `word` and I don't see how it could be `NULL` based on how `load` is calling it. – Daniel Walker Feb 05 '21 at 02:50
  • I tried to debug it, it segs fault in this line: `table[hash(word)]->next = buffer;` still not sure why, I've checked the debugger a couple of times and I am not sure why it segfaults. – dani1995ar Feb 05 '21 at 03:01
  • `table[hash(word)]->next` that assumes `table[hash(word)]` is non-NULL. But consider what that value is the first time that hash bucket is used. What value does the debugger tell you it is? – kaylum Feb 05 '21 at 03:14
  • Follow what _kaylum_ said. But, I think there may be more to it. I think you're trying to do a "push front". Off the top of my head, you may want: `unsigned int hidx = hash(word); node *head = table[hidx]; if (head != NULL) buffer->next = head->next; else buffer->next = NULL; table[hidx] = buffer;` – Craig Estey Feb 05 '21 at 03:36
  • Actually, I think it's simpler than that: `unsigned int hidx = hash(word); buffer->next = table[hidx]; table[hidx] = buffer;` – Craig Estey Feb 05 '21 at 03:46
  • Your `hash` is suspect; probably meant to seed `hashNumber` with the initial value, then, for each character, `tolower` and modify `hashNumber` based on the `char`. Probably not the issue, though, unless you are hashing an empty string. – Neil Feb 05 '21 at 05:09
  • 1
    Consider your bucket size of `54` against the `holmes.txt` file you will use as input that has `1137706` words. Ideally, you want to keep your hash-table load factor less than .7 (number of buckets filled / number of buckets). Multiply your buckets by a minimum of `1000` (better `2000`) which will not preserve a load-factor of less than `.7`, but will dramatically reduce the number of list iterations from each bucket. – David C. Rankin Feb 05 '21 at 05:09
  • Also, you need to take care in how you are handling the case of each word. Ideally, you will want to convert each word to lowercase BEFORE you hash and store the word in your hash table. (you don't want to store multiple copies of `"Can"`, `"can"`, etc..). This means when you go to compare words, you need to again convert to lowercase BEFORE you hash to check if the word exists. This also eliminate using `strcasecmp()` in your `check()` function. – David C. Rankin Feb 05 '21 at 05:17
  • Also, the OpenSSL [lh_strhash() - line 443](https://docs.huihoo.com/doxygen/openssl/1.0.1c/lhash_8c_source.html) is a very good string hashing function to minimize collisions in your table and much more efficient than a `md5sum`. – David C. Rankin Feb 05 '21 at 05:34
  • per the output from valgrind: *==1697== by 0x401B71: load (dictionary.c:78)* where lines 77...79 are : `loadedWords ++; } fclose(file);` is a clear indication that the code the OP posted is NOT what is actually being run – user3629249 Feb 06 '21 at 22:43

2 Answers2

0

regarding the OPs statement: the code compiles ok,

here is the output from the compiler. Which clearly shows that the posted code DOES NOT compile,

gcc -ggdb3 -Wall -Wextra -Wconversion -pedantic -std=gnu11 -c "untitled1.c" -o "untitled1.o" 

untitled1.c:26:7: error: variably modified ‘table’ at file scope
   26 | node *table[N];
      |       ^~~~~

untitled1.c: In function ‘check’:

untitled1.c:31:15: warning: implicit declaration of function ‘hash’ [-Wimplicit-function-declaration]
   31 |     if (table[hash(word)]->next == NULL)
      |               ^~~~

untitled1.c: At top level:

untitled1.c:46:14: error: conflicting types for ‘hash’
   46 | unsigned int hash(const char *word)
      |              ^~~~

untitled1.c:31:15: note: previous implicit declaration of ‘hash’ was here
   31 |     if (table[hash(word)]->next == NULL)
      |               ^~~~

untitled1.c: In function ‘hash’:

untitled1.c:51:28: warning: conversion from ‘int’ to ‘char’ may change value [-Wconversion]
   51 |         lowerCaseWord[i] = tolower(word[i]);
      |                            ^~~~~~~

untitled1.c:53:31: warning: conversion to ‘unsigned int’ from ‘int’ may change the sign of the result [-Wsign-conversion]
   53 |     unsigned int hashNumber = ((int)word[0] + (int)word[1]) % 53;
      |                               ^

untitled1.c:48:10: warning: variable ‘lowerCaseWord’ set but not used [-Wunused-but-set-variable]
   48 |     char lowerCaseWord[LENGTH + 1];
      |          ^~~~~~~~~~~~~

untitled1.c: In function ‘unload’:

untitled1.c:92:23: warning: comparison of integer expressions of different signedness: ‘int’ and ‘unsigned int’ [-Wsign-compare]
   92 |     for (int i = 0; i < N - 1; i++)
      |                       ^

Compilation failed.

regarding:

char lowerCaseWord[LENGTH + 1];
for (int i = 0; word[i]; i++)
{
    lowerCaseWord[i] = tolower(word[i]);
}

the array lowerCaseWord[] is not used anywhere, so this block of code can be removed

user3629249
  • 16,402
  • 1
  • 16
  • 17
0

You keep accessing the next pointer in the first node in a list, before you know if that list is empty. Let me explain. table is an array of pointers. node *table[N]; This definition creates an array of N pointers to node. When you index into into table, you get back a pointer. So table[2] is a pointer, and it may or may not be NULL depending on whether anything has been added to that list. It is certainly NULL before load has added anything to it.

In check you have this line:

if (table[hash(word)]->next == NULL) // possible seg fault

What if the index hash returns leads to an empty list? Then this becomes NULL->next

Using the arrow operator dereferences the pointer, and dereferencing NULL is illegal. You want to check whether the current node-pointer is NULL, the next is irrelevant for now.

Using recursion here is also a bad idea. It would be faster, and more memory efficient to just loop over each node in the list.

You make the same mistake in your load function:

buffer->next = table[hash(word)]->next; // why next?
table[hash(word)]->next = buffer; // why next?

writing to buffer->next is correct here. You just called malloc to allocate space for it, so you write to the next pointer in that node. But then you take the value of table[hash(word)]->next instead of just table[hash(word)]

You appear to be confused about where the list starts. Trying to read next is never going to the first node in the list.