5

I am mainly interested in the viability of shrinking such an array.

I'm working on a project where I have used single malloc() calls to each create individual moderately large 2D arrays. (Each only few tens of MiB, at the largest.) The thing is that over the life of one of the arrays, its content dramatically shrinks in size (by more than half). Obviously, I could just leave the array size alone for the life of the program. (It's only a x MiB on a system with GiB of RAM available.) But, we are talking about more than half of the allocated space falling into disuse well before the program terminates, and, due to the nature of how I am using the array, all the surviving data is kept in a contiguous set of rows (at the beginning of the block). It seems like a waste to hold on to all that RAM if I really don't need it.

While I know realloc() can be used to shrink dynamically created arrays, a 2D array is more complex. I think I understand the memory layout of it (as I implemented the function that structures it), but this is pushing the limits of my understanding of the language and the workings of its compilers. Obviously, I would have to work with rows (and deal with the row pointers), not merely bytes, but I don't know how predictable the outcome of all this would be.

And, yes, I need to create the array with a single malloc(). The object in question has several million rows. I tried using a loop to malloc() each row separately, but the program always froze at around 100,000 malloc()s.

For background, the source I'm using to construct these array is as follows:

char ** alloc_2d_arr(int cnum, int rnum) {
        /* ((bytes for row pointers + (bytes for data)) */
        char **mtx = malloc(rnum * sizeof (char *) + rnum * cnum * sizeof (char));

        /* Initialize each row pointer to the first cell of its row */
        char *p = (char *) (mtx + rnum);
        for (int i = 0; i < rnum; i++) {
                mtx[i] = p + i * cnum;
        }

        return mtx;
}
Simon Bosley
  • 1,114
  • 3
  • 18
  • 41
CircleSquared
  • 379
  • 2
  • 11
  • 2
    `realloc` is not a good idea for such big tables, take a look to "avl" and "red-black" trees. – David Ranieri May 26 '16 at 08:27
  • 1
    "It seems like a waste to hold on to all that RAM if I really don't need it." - First *profile*. Second, a realloc runs the high risk of triggering copying all your still-interior data to different pages, the non-trivial expense of which you incur solely for the reason of trying to save ram you yourself claim isn't really a problem. The only win-scenario here is `realloc` keeping the same region-head as your memory base, and the tail-pages being returned for other usage; something `realloc` has no guarantees about.... – WhozCraig May 26 '16 at 08:44
  • ... so have you instead considered just making 2 (or 3 or 4, whatever) allocations, remember which ones you're eventually keeping, and `free()`-ing the ones you no longer need once that event transpires? I.e. the "kept" half of your matrix is in the first allocation, the second half is in another allocation, and eventually you just free the second half. – WhozCraig May 26 '16 at 08:45
  • 2
    In case you are unaware, the leading block of pointers is unnecessary if you are willing to restructure your code slightly. You could save a lot of space by getting rid of that. – M.M May 26 '16 at 09:09

1 Answers1

2

Using multidimensional arrays, this can be done with or without pointers to variable length arrays. Since you probably don't want to allocate any additional memory, this will be done in place.

First allocate a 20 by 10 array:

int ( *array )[10] = malloc( sizeof(int ) * 20 * 10 );
for( size_t i = 0 ; i < 20 ; i++ )
    for( size_t j = 0 ; j < 10 ; j++ )
          array[i][j] = i * 100 + j;

If you want to change the number of rows, no elements have to be moved, only a realloc is needed. Changing the row count to 15 is trivial:

array = realloc( array , sizeof( int ) * 15 * 10 );

If you want to change the column count, then the elements will have to be moved. Since we don't need to copy the first column, the copying starts at the second one. Function memmove is used to avoid memory overlap, which cannot happen in this case, but it could if the new column count were larger. Also it avoids aliasing problems. Note that this code is defined only because we are using allocated memory. Let's change the column count to 3:

int (*newarray)[3] = ( int(*)[3] )array;
for( size_t j = 1 ; j < 15 ; j++ )
    memmove( newarray[j] , array[j] , sizeof( int ) * 3 );
newarray = realloc( array , sizeof( int ) * 15 * 3 );

Working example: https://ideone.com/JMdJO0

If the new column count happens to be larger than the old one, then the memory will have to be reallocated first (to simply get more space), and then the column copying will take place, instead starting at the last column.

2501
  • 25,460
  • 4
  • 47
  • 87
  • It shames me to admit it, but I'm having trouble understanding `int ( *array )[10] = malloc(...);`. Based on my apparently deficient grasp of C, that looks like initializing a newly created variable with a dereferenced pointer as its identifier. It would be one thing to dereference a triple pointer (twice) and give the malloc() output to that, but putting a type in front makes it look like a RAM address is being used as the symbol (which doesn't make sense). I feel like I saw whatever this is when I was learning C, but Googling the relevant terms obviously gives polluted results. – CircleSquared May 27 '16 at 01:43
  • @CircleSquared It is a pointer to a multidimensional array. Two dimensions in my example. Like this: `int a[7][9]; int(*pa)[9]=a;`, except in my example I don't point that pointer to an automatic array, but to allocated memory. – 2501 May 27 '16 at 06:54
  • Thank you for showing me a more efficient way to dynamically allocate a multidimensional array **in addition** to simply answering my question. *So, it's safe enough, but this is a better way to set up the array.* – CircleSquared May 27 '16 at 19:08