3

So yeah, saw many similar questions to this one, but thought to try solving it my way. Getting huge amount of text blocks after running it (it compiles fine).

Im trying to get an unknown size of string from a file. Thought about allocating pts at size of 2 (1 char and null terminator) and then use malloc to increase the size of the char array for every char that exceeds the size of the array.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

int main()
{
    char *pts = NULL;
    int temp = 0;

    pts = malloc(2 * sizeof(char));
    FILE *fp = fopen("txtfile", "r");
    while (fgetc(fp) != EOF) {
        if (strlen(pts) == temp) {
            pts = realloc(pts, sizeof(char));
        }
        pts[temp] = fgetc(fp);
        temp++;
    }

    printf("the full string is a s follows : %s\n", pts);
    free(pts);
    fclose(fp);

    return 0;
}
Sean Bright
  • 118,630
  • 17
  • 138
  • 146
Hacktivator
  • 63
  • 3
  • 7
  • Interesting question, but what exactly is the issue? If you're trying to get a string of unknown size and the result is a huge block of text, is that not a success? Sounds like you read the entire file to me. – Nick Reed Dec 17 '19 at 15:43
  • At this point - `strlen(pts)` you don't know what is inside `pts` and you invoke `strlen()` on it thus resulting in UB. Maybe `calloc()` would be a better choice? – babon Dec 17 '19 at 15:44
  • 1
    Worst case scenario, string is same size as file; can you just allocate this much memory? – Fiddling Bits Dec 17 '19 at 15:46
  • 1
    `pts = realloc(pts, sizeof(char))` does not _expand_ the buffer, but always allocates 1 byte. You have to specify the _total_ length – Ctx Dec 17 '19 at 15:46
  • What happens each time this is executed: `while (fgetc(fp) != EOF) {`? Something is read from the file, but where does it go? – Adrian Mole Dec 17 '19 at 15:46
  • Moreover, you most probably don't want to call `fgetc()` twice in the same iteration when you are assigning to `pts[temp]` just once. – babon Dec 17 '19 at 15:55
  • `malloc`, `realloc`, and `fopen` can all fail. You should read the documentation for these functions (especially `realloc`'s) and handle their errors appropriately. – Sean Bright Dec 17 '19 at 16:01

3 Answers3

3

You probably want something like this:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

#define CHUNK_SIZE 1000               // initial buffer size

int main()
{
  int ch;                             // you need int, not char for EOF
  int size = CHUNK_SIZE;

  char *pts = malloc(CHUNK_SIZE);
  FILE* fp = fopen("txtfile", "r");

  int i = 0;
  while ((ch = fgetc(fp)) != EOF)     // read one char until EOF 
  {
    pts[i++] = ch;                    // add char into buffer

    if (i == size + CHUNK_SIZE)       // if buffer full ...
    {
      size += CHUNK_SIZE;             // increase buffer size
      pts = realloc(pts, size);       // reallocate new size
    }
  }

  pts[i] = 0;                        // add NUL terminator

  printf("the full string is a s follows : %s\n", pts);
  free(pts);
  fclose(fp);

  return 0;
}

Disclaimers:

  1. this is untested code, it may not work, but it shows the idea
  2. there is absolutely no error checking for brevity, you should add this.
  3. there is room for other improvements, it can probably be done even more elegantly
Jabberwocky
  • 48,281
  • 17
  • 65
  • 115
  • 1
    Why not just calculate the file length, allocate memory for this amount, and write it then? `realloc` seems to add unnecessary complexity. – Fiddling Bits Dec 17 '19 at 16:09
  • @FiddlingBits how do you know the size of stdin? – Jabberwocky Dec 17 '19 at 16:10
  • That's a good point, but OP is specifically asking about a file. – Fiddling Bits Dec 17 '19 at 16:11
  • The general solution is more flexible, but you're correct that you could also check the file size (with `stat()` for example) and allocate once. – Sean Bright Dec 17 '19 at 16:13
  • 1
    @FiddlingBits the point is also to show how to do the realloc stuff correctly, but yes, if it's only about a reading a file entirely into memory, then getting the file length, allocating the right memory size and reading the file with a single fread is definitely simpler and more efficient. – Jabberwocky Dec 17 '19 at 16:14
1

Leaving aside for now the question of if you should do this at all:

You're pretty close on this solution but there are a few mistakes

while (fgetc(fp) != EOF) {

This line is going to read one char from the file and then discard it after comparing it against EOF. You'll need to save that byte to add to your buffer. A type of syntax like while ((tmp=fgetc(fp)) != EOF) should work.

pts = realloc(pts, sizeof(char));

Check the documentation for realloc, you'll need to pass in the new size in the second parameter.

pts = malloc(2 * sizeof(char));

You'll need to zero this memory after acquiring it. You probably also want to zero any memory given to you by realloc, or you may lose the null off the end of your string and strlen will be incorrect.


But as I alluded to earlier, using realloc in a loop like this when you've got a fair idea of the size of the buffer already is generally going to be non-idiomatic C design. Get the size of the file ahead of time and allocate enough space for all the data in your buffer. You can still realloc if you go over the size of the buffer, but do so using chunks of memory instead of one byte at a time.

Segfault
  • 8,036
  • 3
  • 35
  • 54
  • Shouldn't you check for `'\0'` instead of `EOF`? – Fiddling Bits Dec 17 '19 at 16:01
  • The null is the end of string marker for in-memory buffers in C. The file itself is not required to be null terminated, or to have any nulls in it at all. Add'l reading: https://en.cppreference.com/w/c/io/fgetc – Segfault Dec 17 '19 at 16:06
  • _"You'll need to zero this memory after acquiring it."_ - This is true for OP's broken implementation, but if you add the 0 terminator correctly at the end this would be unnecessary. – Sean Bright Dec 17 '19 at 16:15
  • @Segfault Heya, first of all thnx for your input :D. I did not get the third thing (zero-ing thing) at all.. could you elaborate more on this topic?. And yea I know it would be smarter to get the size of the file itself but I wanted to advance with the technique so it would be "transferable" to other info sources. It might be a file or a console command or w.e so I tried to get an universal code to work with w.e input source i'd get. – Hacktivator Dec 17 '19 at 17:55
  • The point about zero-ing is about making sure that your buffers (strings) are null terminated correctly. Sean Bright is correct when he says if you add the null terminator correctly it's not an issue. The problem is that malloc and realloc are returning uninitialized memory to you so it will have some garbage data in it. Malloc doesn't initialize the memory to zero before it gives it to you. When you start writing data into the buffer you've malloc'd, it won't be null terminated unless you do it yourself. Setting all the bytes to zero after allocating them is an easy way to do that. – Segfault Dec 17 '19 at 21:47
0

Probably the most efficient way is (as mentioned in the comment by Fiddling Bits) is to read the whole file in one go (after first getting the file's size):

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <sys/stat.h>

int main()
{
    size_t nchars = 0; // Declare here and set to zero...
    // ... so we can optionally try using the "stat" function, if the O/S supports it...
    struct stat st;
    if (stat("txtfile", &st) == 0) nchars = st.st_size;

    FILE* fp = fopen("txtfile", "rb"); // Make sure we open in BINARY mode!
    if (nchars == 0) // This code will be used if the "stat" function is unavailable or failed ...
    {
        fseek(fp, 0, SEEK_END); // Go to end of file (NOTE: SEEK_END may not be implemented - but PROBABLY is!)
    //  while (fgetc(fp) != EOF) {} // If your system doesn't implement SEEK_END, you can do this instead:
        nchars = (size_t)(ftell(fp)); // Add one for NUL terminator
    }
    char* pts = calloc(nchars + 1, sizeof(char));

    if (pts != NULL)
    {
        fseek(fp, 0, SEEK_SET); // Return to start of file...
        fread(pts, sizeof(char), nchars, fp); // ... and read one great big chunk!
        printf("the full string is a s follows : %s\n", pts);
        free(pts);
    }
    else
    {
        printf("the file is too big for me to handle (%zu bytes)!", nchars);
    }
    fclose(fp);
    return 0;
}

On the issue of the use of SEEK_END, see this cppreference page, where it states:

  • Library implementations are allowed to not meaningfully support SEEK_END (therefore, code using it has no real standard portability).

On whether or not you will be able to use the stat function, see this Wikipedia page. (But it is now available in MSVC on Windows!)

Adrian Mole
  • 49,934
  • 160
  • 51
  • 83