11

I have a question about dynamic memory allocation.

Context: I'm writing a program that reads a text file of words and counts the frequency with which each word occurs (one word per line).

This particular function reads the file, counts the lines and characters, then dynamically allocates memory to the array of string pointers, an array storing the count of characters for each line and the strings themselves. (The other parts are less directly relevant to my question).

Question: How often should I reallocate memory if I run out of space? I set a constant ("memstart") for setting the initial memory allocation value. In the below code snippet I realloc for every line over the value of "memstart". Would the program process faster if a reallocated a larger block of memory instead of increasing the memory space by 1 "variable type" each time?

What would be best practice for something like this?

Code Snip:

int read_alloc(FILE* fin, FILE *tmp, char **wdp, int *sz){
    int line_cnt= 0, chr, let=1;
    do{
        chr=getc(fin);
        let++;          
        //count characters

        if(chr!=EOF){
            chr=tolower(chr);
            fputc(chr, tmp);
        }               
        //convert to lcase and write to temp file

        if ('\n' == chr || chr==EOF){
            sz[(line_cnt)]=((let)*sizeof(char));            //save size needed to store string in array
            *(wdp+(line_cnt))=malloc((let)*sizeof(char));   //allocate space for the string
            if ((line_cnt-1) >= memstart){
                realloc(wdp, (sizeof(wdp)*(memstart+line_cnt)));    //if more space needed increase size
                realloc(sz, (sizeof(sz)*(memstart+line_cnt)));
            }
            line_cnt++;         
            let=1;
        }
    } while (EOF != chr);

    return (line_cnt);
}
Donald Duck
  • 8,409
  • 22
  • 75
  • 99
LeAnn
  • 121
  • 1
  • 8
  • 3
    Usually the containers grow by some constant factor. Say 1.5. That said, every time you need to resize it - you make its capacity `1.5x` of current. – zerkms Dec 04 '17 at 02:10
  • 1
    When growing an allocation a good general policy is to start with a reasonably sized allocation, and to _double_ as needed. When finished with growing the allocation can be trimmed to final size with a final call to `realloc()`. – ad absurdum Dec 04 '17 at 02:11
  • You should be reallocating `log` number of times. – Ajay Brahmakshatriya Dec 04 '17 at 03:18
  • 1
    `realloc` (with a greater size than before) should be used at least as many times as the number of times you intend to trample past the last byte of the previous size. Any fewer and you're in trouble. – Kaz Dec 04 '17 at 06:28

2 Answers2

8

While the question is about how often realloc should be called, looking at OP's code, I think it's better to start with how safely it should be done.

The C11 standard states (n1570 draft, § 7.22.3.5, The realloc function, emphasis mine):

Synopsis

#include <stdlib.h>
void *realloc(void *ptr, size_t size);

Description
The realloc function deallocates the old object pointed to by ptr and returns a pointer to a new object that has the size specified by size. The contents of the new object shall be the same as that of the old object prior to deallocation, up to the lesser of the new and old sizes. Any bytes in the new object beyond the size of the old object have indeterminate values.
If ptr is a null pointer, the realloc function behaves like the malloc function for the specified size. (...). If memory for the new object cannot be allocated, the old object is not deallocated and its value is unchanged.
Returns
The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.

Now let's consider this snippet from the question, where sz is declared as int* sz;

realloc(sz, (sizeof(sz)*(memstart+line_cnt)));

The return value is lost, so we can't know if the call succeeded and if it did, sz is invalidated. Moreover, sizeof(sz) is the size of the pointer, not of the pointed type (int).

A more safe (and correct) pattern would be:

size_t new_size = /* Whatever, let's say */ size + SOME_COSTANT + size / 2;
void *tmp = realloc(ptr, new_size * sizeof *ptr);
if ( tmp == NULL ) {
    /* Deal with the error, e.g. log a message with perror, return NULL
       (if this is in a function) or just give up, but remeber that
       realloc doesn't invalidate nor free 'ptr' on failure */
    exit(EXIT_FAILURE);
}
ptr = tmp; // <- on success, realloc invalidated ptr
size = new_size;   

Now, to answer the question, realloc should be called only when needed, because it involves potentially expansive system calls. So either allocate a big chunk ahead or choose a growing stratey like doubling (or 1.5 times) the size every time.

It's worth noting that if possible, the OS could perform the reallocation without copying any element of the original array.

Bob__
  • 12,361
  • 3
  • 28
  • 42
  • 2
    Just a couple of nits. `realloc` returns `void *`, so `void *tmp = realloc (..` is all that is needed. `if ( tmp == NULL )` **don't free the pointer** all of your previous data is still pointed to by `ptr`, just don't assign `ptr = tmp;`. Last little peeve, the `'*'` goes with the *variable*, not the *type*. (e.g. `int* a, b, c;` certainly does not make `b` & `c` pointers) – David C. Rankin Dec 04 '17 at 15:35
  • @DavidC.Rankin 1) Good point, I've missed that. 2) I freed exactly because the data is still pointed, because in the snippet I'm exiting, usually I do it in a function and return (and check, to deal with other resources to free) NULL, but I get your point. 3) I know ;) it's a style I use (and often seen) because usually I try not to declare more then one variable in a statement, but again, it can be misleadng, and I get your point. – Bob__ Dec 04 '17 at 15:48
  • 1
    Thank you. This was extremely beneficial. – LeAnn Dec 05 '17 at 18:47
4

The classic answer is to double each time, but a factor of 1.5 might be better. The important bit is that you multiply your array size by some factor each time, rather than adding additional space each time.

Each re-allocation might need to copy the previous array into a new one. We'd like to minimize these copies. If we will be adding n items, and we start with an array of size a, increase by a factor of r each re-allocation, to end with a value of n, the sequence of (re-)allocations will be a, ar, ar^2, ar^3, ..., n. The sum of that sequence is (nr-a)/(r-1). Thus the total space is of order O(n).

Suppose instead we start with a, and this time add r each time. The sequence is a, a+r, a+2r, a+3r, ..., n. The sum of that sequence will be 0.5*((n^2-a^2)/r + a + n). In this case the total space of order O(n^2). Much worse!

With a constant factor of 2, the array will be in the worse case 1/2 empty. That's probably ok. You can always shrink the allocation when you're done and know the final size.

As pointed out in another answer, there are several bugs in the manner in which you call realloc(), but that wasn't the question.

TrentP
  • 4,240
  • 24
  • 35