0

So I have an assignment to create a program in c that reads a couple of sentences(a 140mb file), and based on the 2nd input, which is a number, I need to return the Nth most common word. My idea was to build a hash table with linear probing, every time I get a new element I hash it accordingly based its position and based on djb2, else if there is a collision I rehash. After that, I apply Quicksort based on the occurrence and then I finally access by index. I am having issues finishing up a hash table with linear probing in c. I am pretty sure I have finished it but every time I run I am getting a heap buffer overflow on lldb. I tried to spot the issue but I still cannot figure it out.

Am I getting out of memory on stack? The file is relatively small to consume so much memory. I used address sanitiser and I got a heap-buffer-overflow on inserting.

I don't think I am touching the memory outside the allocate region but I am not 100% sure.

Any idea what has gone wrong? This is the table.c implementation and below that you can see the form of the struct.

Here is a more detailed message from address sanitiser:

thread #1: tid = 0x148b44, 0x0000000100166b20 libclang_rt.asan_osx_dynamic.dylib`__asan::AsanDie(), queue = 'com.apple.main-thread', stop reason = Heap buffer overflow

{
 "access_size": 1,
 "access_type": 1,
 "address": 105690555220216,
 "description": "heap-buffer-overflow",
 "instrumentation_class": "AddressSanitizer",
 "pc": 4294981434,
 "stop_type": "fatal_error"
}

table.c :

#include "table.h"
#include "entities.h"

static inline entry_t* entryInit(const char* const value){

    unsigned int len   = strlen(value);
    entry_t* entry     = malloc(sizeof(entry));
    entry->value       = malloc(sizeof(char*) * len);
    strncpy(entry->value, value, strlen(value));
    entry->exists      = 1;
    entry->occurence   = 1;

    return entry;
}

table_t* tableInit(const unsigned int size){

    table_t* table     = malloc(sizeof(table_t));
    table->entries     = malloc(size*sizeof(entry_t));
    table->seed        = getPrime();
    table->size        = size;
    table->usedEntries = 0U;

    return table;
}

//okay, there is definitely an issue here
table_t* tableResize(table_t* table, const unsigned int newSize){

    //most likely wont happen but if there is an overflow then we have a problem
    if(table->size > newSize) return NULL;

    //create a temp array of the realloced array, then do changes there
    entry_t* temp = calloc(newSize,sizeof(entry_t));

    table->size = newSize;

    //temp pointer to an entry
    entry_t *tptr = NULL;
    unsigned int pos = 0;
    unsigned int index = 0;

    while(pos != table->size){

        tptr = &table->entries[pos];

        if(tptr->exists == 1){

            index = hashString(table->seed, tptr->value, table->size, pos);

            temp[index] = *entryInit(tptr->value);

            temp[index].occurence = tptr->occurence;

            break;
        }

        else pos++;
   }

   table->entries = temp;
   //TODO: change table destroy to free the previous array from the table
   free(temp);

   return table;
}

//insert works fine, it is efficient enough to add something in the table
unsigned int tableInsert(table_t* table,const char* const value){

    //decide when to resize, might create a large enough array to bloat the memory?
    if(table->usedEntries >(unsigned int)(2*(table->size/3))) table = tableResize(table, table->size*2);

    entry_t* entry = NULL;
    unsigned int index;
    auto int position = 0;

    while(position != table->size){

        //calculate the hash of our string as a function of the current position on the table
        index = hashString(table->seed,value,table->size, position);
        entry = &table->entries[index];

        if(entry->exists == 0){

            *entry = *entryInit(value);
            table->usedEntries++;
            return index;

        } else if (entry->exists == 1 && strcmp(entry->value, value) == 0){

            entry->occurence++;
            return index;

        } else{
            position++;
    }
  }
}

//there might be an issue here
static inline void tableDestroy(const table_t* const table){

    entry_t* entry = NULL;

    for (auto int i = 0; i < table->size; ++i){

        entry =&table->entries[i];

 //printf("Value: %s  Occurence: %d  Exists: %d \n",entry->value, entry->occurence, entry->exists );

       if(&table->entries[i] !=NULL)free(&table->entries[i]);
   }
   free(table);
}

entities.h :

#pragma once

typedef struct __attribute__((packed)) __entry {

    char *value;
    unsigned int exists : 1;
    unsigned int occurence;

} entry_t;

typedef struct __table {

    int size;
    int usedEntries;
    entry_t *entries;
    unsigned int seed;

} table_t;

here is how I read from a file and process the text:

void readFromFile(const char* const fileName, table_t* table){

    FILE *fp = fopen(fileName, "r");

    if(!fp) fprintf(stderr,"error reading file. \n");

    char word[15];//long enough to hold the biggest word in the text?
    int position = 0;
    char ch;

    while((ch = fgetc(fp))!= EOF){

        //discard all the ascii chars that are not letters
        if(!(ch  >= 65 && ch <= 90) && !(ch >= 97 && ch <= 122)){

        word[position]= '\0';

        if(word[0] == NULL)continue;

        tableInsert(table, word);

        position = 0;

        continue;

        }
        else word[position++] = ch;
  }
}

Any suggestions what is wrong with my code? I believe resize might have an issue and I am not properly deleting yet because I have had a lot of problems with the memory management.

Thanks in advance!

  • `table->entries = temp; free(temp);` What is the intent of that? As soon as free is called the `temp` memory is invalid so the `table->entries` pointer is invalid. – kaylum Nov 09 '20 at 10:56
  • @kaylum yes, I noticed that that is why I have the TODO tag there, I want to change that so it will free the table->entries array. But do you think this is the issue with the memory? –  Nov 09 '20 at 11:09
  • I think there might be an issue with how I resize and how I add, but I believe I don't see it ... –  Nov 09 '20 at 11:10
  • I don't know if it is the only issue but it certainly is a very big issue. You can't free memory and then access it and not expect to get all sorts of wrong behavour. – kaylum Nov 09 '20 at 11:13
  • @kaylum I think i fixed that issue but adding this block of code where delete for(auto int i =0; i< table->size/2; i++){ if(previousArray[i].exists == 1) free(&previousArray[i]); } table->entries = temp; –  Nov 09 '20 at 11:38
  • but I am still getting the issue, btw entry_t* previousAray = table->entries; is located below the size reassign, pretty much on the top of the function. Also, I see at the debugger that it never reaches this stage, it gets stuck at the first word of the text –  Nov 09 '20 at 11:40
  • @kaylum I resolved another issue when I was allocating memory for the entry struct, I replaced the first sizeof with sizeof(entry_t) and the value allocation with malloc(strlen(value) +1). (I added +1 for the null terminating character, strlen doesn't count that and strcmp wants to see a null terminating character). –  Nov 09 '20 at 11:47
  • That caught my eye: `entry->value = malloc(sizeof(char*) * len);` shouldn't it be `sizeof(char)` instead of `sizeof(char *)`? – axelduch Nov 09 '20 at 13:15
  • @axelduch hi, yes, I fixed that issue since I saw that(I mentioned that before in the comments I think) but my guess is that the issue should be around there. The thing is that, the issue is around entryInit and as I think there is something around the resize. I am guessing I should try to free the char* when free the entry? –  Nov 09 '20 at 13:28

0 Answers0