0

I'm using the tsearch/tfind/twalk functions from the search.h library in a program that basically sorts and counts unique words in a document. Since I don't know how many words are in a document, I'm using malloc to assign a certain amount of memory to the array holding all the words at first, then using realloc to make it bigger if I've filled that up. However, realloc is apparently corrupting the tree maintained by tsearch and twalk starts returning junk nodes or the content of nodes is getting corrupted.

The struct definitions:

struct word {
    char word[MAX_MTEXT];
    int occur;
};

struct mymsg {
    long mtype;
    char mtext[MAX_MTEXT];
    int occur;
};

The code below is the whole child process code, and has some other stuff dealing with getting the words from a message queue:

f = 1;
i = 0;
words_entered = 0;
entry = (struct word *) malloc(words_allocated * sizeof(struct word));
while(f) {
    if (msgrcv(m_key, &mymsg, sizeof(struct mymsg), (long) getpid(), 0) == -1) {
        perror("recieve");
        exit(EXIT_FAILURE);
    }
    //printf("%s recieved\n",mymsg.mtext);
    if (mymsg.mtext[0] == '\0') {
        //printf("term recv\n");
        f = 0;
    }
    else {
            //printf("mtext = %s\n",mymsg.mtext);
        memcpy(&entry[i].word,&mymsg.mtext,MAX_MTEXT);
        //printf("entry = %s\n",entry[i].word);
        entry[i].occur = 1;
        //printf("%s entered\n",entry[i].word);
        words_entered++;
        if (words_entered == words_allocated) {
            printf("About to realloc\n\n");
            twalk (root, action);
            words_allocated = words_allocated *2;
            entry = (struct word *) realloc(entry,(size_t) words_allocated * sizeof(struct word));
            printf("After realloc\n\n");
            twalk (root, action);
        }
        ptr = tfind(&entry[i],&root,compare);
        if (ptr == NULL) {
            //printf("null\n");
            ptr = tsearch(&entry[i],&root,compare);
            //printf("%s added to tree\n",(*ptr)->word);
        }
        else {
            (*ptr)->occur++;
        }
        i++;
        //printf("check\n");
    }
}
twalk (root, action);
mymsg.mtype = ret_id;
mymsg.mtext[0] = '\0';
mymsg.occur = 0;
if (msgsnd(m_key, &mymsg, sizeof(mymsg)-sizeof(long), 0) == -1) {
    perror("send");
    exit(EXIT_FAILURE);
}
exit(EXIT_SUCCESS);

This the code for action called by walk:

void action(const void *nodep, VISIT value, int level) {
    struct word *w = *((struct word **) nodep);
    struct mymsg mymsg;
    switch (value) {
    case leaf:
    case postorder:
        printf("%s: %i, level %i\n",w->word, w->occur, level);
        mymsg.mtype = ret_id;
        strcpy(mymsg.mtext,w->word);
        //printf("%s vs %s\n",w->word,mymsg.mtext);
        mymsg.occur = w->occur;
        if (msgsnd(m_key, &mymsg, sizeof(mymsg)-sizeof(long), 0) == -1) {
            perror("send");
            exit(EXIT_FAILURE);
        }
        break;
    default:
        break;
    }
    return;
}

Here's a result running an initial allocation of 5:

About to realloc

each: 1, level 1
is: 1, level 0
therefore: 1, level 2
translator: 1, level 1

After realloc

Ð3³: 1, level 1
is: 1, level 0
therefore: 1, level 2
translator: 1, level 1

About to realloc

for: 1, level 2
his: 1, level 1
$p  : 158343352, level 2
is: 1, level 0
own: 1, level 3
portion;: 1, level 2
responsible: 1, level 3
therefore: 1, level 1
p p rlator: 1, level 2

After realloc

for: 1, level 2
his: 1, level 1
$p  : 158343352, level 2
is: 1, level 0
own: 1, level 3
portion;: 1, level 2
responsible: 1, level 3
therefore: 1, level 1
p p rlator: 1, level 2
FRob
  • 3,883
  • 2
  • 27
  • 40
HamHamJ
  • 435
  • 2
  • 10

1 Answers1

1

Disclaimer: I have never worked GNU C's with tree search functions before.

Now, if I look at the corresponding documentation:

— Function: void * tsearch (const void *key, void **rootp, comparison_fn_t compar)

If the tree does not contain a matching entry the key value will be added to the tree. tsearch does not make a copy of the object pointed to by key (how could it since the size is unknown). Instead it adds a reference to this object which means the object must be available as long as the tree data structure is used.

You invalidate your tree node pointers every time realloc needs to move the memory. Also, you don't anticipate tsearch returning NULL.

The easiest solution would be to just allocate single word elements instead of buffering them in an array. This might impose some speed penalty.

If you do need to have word entries arranged in blocks, then why not just twalk the root and update all element pointers if realloc(entry, ...) != entry? EDIT: You might run into UB there, as per the description. However, it's not 100% clear if they talk about a general or MT case.

FRob
  • 3,883
  • 2
  • 27
  • 40
  • I don't think tsearch can return NULL. It always either returns a matching entry or creates one. To allocate single word elements, I'm not really clear on how to code that, specifically how to create and refer to the variable? If they are in array I can index to them, but I don't see how I can dynamically create a new variable for each word. As for the last one... so I would malloc a new array then walk through the old one adding each element to the new one and then free the old one? – HamHamJ Feb 19 '14 at 23:14
  • "If an entry had to be created and the program ran out of space NULL is returned." -- directly from the documentation – FRob Feb 19 '14 at 23:15