1

My original task is: Given a file of numbers, I have to find all pairs (a pair is just 2 numbers; they cannot be consecutive) that fulfil a certain condition.

To do that, I've decided to create an array where I will store all the numbers. I have a ~11000 Kbyte file of numbers (the number of numbers is ~1,5*10^9), and I need to read all of them from the file and store them in an array.

Numbers in the file are arranged like this; 0 < number <= 100000:

10
20
30
40
50
60

The first thing I decided to do was to create just a regular array int arr[150000000]. But this won't do due to stack overflow.

Then I decided to use calloc.

#include<stdio.h>
#include<stdlib.h>

int main()
{
    FILE *file = fopen("nums.txt", "r");
    
    int str_size = 8;
    char str[str_size]; // 0 < number <= 100000, so using the length 8 (6 digits + '\n' + null-terminating byte) is enough.
    
    int size = 2000; //Initial capacity
    int *a = calloc(size, sizeof(int)); //The array

    int i = 0;
    while( ((fgets(str, str_size, file) ) != NULL)) { //read a line from the file (a line = 1 number)
        if(atoi(str)!=0) // '\n' is on the end of each line, so fgets() function will read an empty line and return 0, we don't want that.
        {
            a[i] = atoi(str); //Convert a string to int and storing the value in the array
            i++;
        }
        a = realloc(a, sizeof(int)*(size*2) ); //And then I realloc my array, multiplying the size by 2.
        size = size * 2;
    }

    a[i] = -1; //When the flow of numbers ended, I mark the end with -1.

    fclose(file);
    free(a);
    return 0;
}

This code works well for smaller files (100 Kbyte files), but fails for larger files, returning the following:

realloc(): invalid next size
Aborted (core dumped)

As far as I understand, this means that my computer cannot allocate more memory. So what can I do?

EDIT

I understand my first error. I reallocated the array's size, but I didn't increase the size variable itself. But it didn't help me, now I get the Segmentation fault (core dumped) instead.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Rodion Iskhakov
  • 139
  • 1
  • 9
  • 4
    `realloc(): invalid next size` indicates you have corrupted memory, likely by writing outside the allocated array. The code you show never updates `size`, so it is not updating the allocation for the size you need as `i` increases. – Eric Postpischil Jan 12 '23 at 18:59
  • 2
    `size` never changes; you just keep reallocating an array of the same (slightly larger) size. – Scott Hunter Jan 12 '23 at 18:59
  • 4
    As a string, `100000` requires seven bytes. Six digits and the null-terminating byte. – Oka Jan 12 '23 at 18:59
  • 2
    Tangent: calling `realloc()` every time round the loop is hideous. Better to double the size of the array each time you fill it. – pmacfarlane Jan 12 '23 at 19:01
  • 1
    Just as a side note: You are going to overwrite the values in the array anyway, so I'd prefer `malloc` over `calloc` to spare the unnecessary costs for zero-initialisation of the array. – Aconcagua Jan 12 '23 at 19:01
  • 2
    The latest edit shows why you shouldn't hardcode sizes in multiple places. Do `fgets(str, sizeof str, file)` instead. You now have `fgets(str, 6, file)` but `str` has size `7` - and it _should_ be `char str[8];` to fit the newline too – Ted Lyngmo Jan 12 '23 at 19:03
  • 1
    `atoi` has a slight disadvantage: On invalid data it returns 0 – and unless 0 is not valid input anyway you are not able to distinguish from a regularly read 0. I'd rather prefer the `strto[...]` conversions instead. – Aconcagua Jan 12 '23 at 19:06
  • You're converting the string twice, once for testing and once for assignment. Better convert just once and store the result in a temporary – or store directly in `a[i]` and use this one to check. If you don't increment `i` it would get overwritten on reading next line anyway. – Aconcagua Jan 12 '23 at 19:09
  • _"`int str_size = 7;`"_ - As I mentioned, even that is too small. `{'1', '2', '3', '4', '5', '6', '\n', '\0'}` - count them – Ted Lyngmo Jan 12 '23 at 19:12
  • Now that you've changed the buffer size to 8 you should be able to change `if(atoi(str)!=0) { a[i] = atoi(str); i++ }` to `if((a[i] = atoi(str)) == 0) { fputs("erroneous input\n", stderr); exit(1); } i++;` since `0` should now never appear anymore. – Ted Lyngmo Jan 12 '23 at 19:27
  • Btw: Rounding up to `1.6 * 10^9 * sizeof(uint32_t) ≈ 6 GiB`. Try allocating that with _one_ `malloc` at start. Does it succeed? – Ted Lyngmo Jan 12 '23 at 19:39

1 Answers1

4

fopen, calloc, and realloc can all fail, returning NULL.

In the case of realloc, the original allocation will remain untouched, so immediately overwriting the original pointer with the newly returned value will leak memory and throw away potentially useful data.

Reallocating every iteration is rather costly. A better strategy is to reallocate when the buffer is full, increasing the memory by some factor. Doubling the current size is a simple approach.

Strictly sized buffers like char [6] are prone to filling up too quickly. Be more generous with your input buffer sizes.

The primary issue is that size is never changed during the loop, so the allocated memory does not keep up with the data read. The buffer overflows and you invoke Undefined Behaviour, likely overwriting some data you should not have touched.

Run this program as ./a.out nums.txt, or omit the argument to read from stdin.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv)
{
    char line[128];
    FILE *file = argc > 1 ? fopen(argv[1], "r") : stdin;

    if (!file) {
        perror(argv[1]);
        return EXIT_FAILURE;
    }

    size_t size = 1024;
    size_t length = 0;
    int *list = malloc(sizeof *list * size);

    if (!list) {
        perror("malloc");
        return EXIT_FAILURE;
    }

    while (fgets(line, sizeof line, file)) {
        int value = atoi(line);

        if (!value)
            continue;

        if (length == size) {
            size *= 2;
            void *memory = realloc(list, sizeof *list * size);

            if (!memory) {
                perror("realloc");
                break;
            }

            list = memory;
        }

        list[length++] = value;
    }

    if (argc > 1)
        fclose(file);

    /* arbitrary use of data */
    size_t odds = 0;

    for (size_t i = 0; i < length; i++)
        if (list[i] & 1)
            odds++;

    printf("Number of odd values: %zu\n", odds);

    free(list);
}
Oka
  • 23,367
  • 6
  • 42
  • 53
  • 1
    Pretty good answer and code. But, if it were me, I'd do `exit(1)` instead of `break` if the `realloc` fails. On most [non-embedded] systems, there's little meaningful work that can be done if we run out of memory except abort. Also, there's no indication (e.g. flag) in the code that the entire file contents could not be loaded. – Craig Estey Jan 12 '23 at 19:59