I'm trying to read data from a file into a buffer. The data in file is of 900K bytes. (seek to end of file and ftell()). Allocated the buffer in which the data is to be read of size 900K + 1 (to null terminate). My question is that fread() returns 900K but the I see the strlen(buffer) it shows lesser value and in the buffer at the last I can see something like ".....(truncated)". Why is this behavior? Is there a limit with fread() beyond which we cannot read into buffer and it will truncate it. Also why the return value of fread() says 900K even though actually it has read even less.?
-
2`strlen` cannot evaluate the size of a binary array. if there are zeroes in your data it stops there. If there aren't, then ... boom (undefined behaviour) – Jean-François Fabre Jun 22 '17 at 19:36
-
Are there zero bytes in the file data? The string stops at the first of them. – Jonathan Leffler Jun 22 '17 at 19:37
-
Files are not null-terminated, and they're not necessarily text. Stop trying to treat them as such. fread() has read more bytes than what you're seeing; there was clearly binary data in the file and a null character in that data is making strlen() think it's reached the end. Binary != text, and binary data can't be handled using character functions like strlen(). – Ken White Jun 22 '17 at 19:39
-
1`fread` can read less than you requested. It will return how much it actually read. `read` can do that even if there's no error. I don't know about `fread`. – ikegami Jun 22 '17 at 19:41
-
Are you saying that the string `(truncated)` was placed in the buffer in lieu of the end of the file? That's not `fread` that did that. – ikegami Jun 22 '17 at 19:43
-
1`fread` is limited to reading the maximum value that can be expressed by a `size_t`. But that's not your problem, as already pointed out. – Carey Gregory Jun 22 '17 at 19:48
-
Ok. I got the point not to compare it with strlen() as it may contain binary data. But my this question has not been answered: why the data poplulated in buffer is like this at the end: 06/21/17 21:41:21 paper..." (truncated) Can someone pls help me this to understand. – user7375520 Jun 22 '17 at 20:28
-
@JonathanLeffler There are no zero bytes in the file..return value of fread() is 900K exactly what the size of file to be read. But the data in the buffer at the last appears as ".....06/21/17 21:41:21 paper..." (truncated) Why this truncated is coming. – user7375520 Jun 22 '17 at 20:31
-
The word "truncated" is probably a part of the data file. Are you on Windows or Unix? If you're on Windows, could there be control-Z characters in the file? These would wreak havoc on your expectations. Unfortunately, this is going to be hard to debug; we can't handle 900 KiB files. So, make sure you preserve your original file. Then think about working out how much data is read, and remove all but say 100 bytes of that from a copy of the file. Then rerun your code on the new file. If the length returned by `strlen()` is now about 100 bytes, you know that the problem isn't at the start. – Jonathan Leffler Jun 22 '17 at 20:35
-
Then you can start analyzing the content of the reduced file to see what's in the location where the reading stops. Use some byte-dump program like `xxd` or `od -c` or anything faintly similar to see what bytes are actually in the file where the trouble occurs. The other possibility is that the reduced file reads in its entirety now. That suggests that the problem is in the section you removed. Try truncating a copy of the original file (you did keep one, didn't you?) and run your program on that. Does it read everything? Basically, keep splitting the file trying to find what causes trouble. – Jonathan Leffler Jun 22 '17 at 20:38
-
If the reduced files never cause trouble, only the full size file, then we need to see your code. There's more likely to be a problem in the code then. Without knowing anything more about the data file, it is hard to say which is more likely — data file or program trouble. But this is why we like to see an MCVE ([MCVE]), even if in this case you can't show us the input data because it is too big. – Jonathan Leffler Jun 22 '17 at 20:40
-
@ikegami: fread can only return less than the request on error or eof conditions. An implementation where it returns less for any other reason is non-conforming. BTW this is a well-known bug in some (all?) versions of MSVC. – R.. GitHub STOP HELPING ICE Jun 22 '17 at 21:12
-
@JonathanLeffler Thanks much! I am on Linux. I can see in the data file where the problem is, the ^M^M dos chars are present. My data file is only text. That could be the problem I guess. But then I tried to copy the portion of data file in another file having these dos chars at the line end, there I can see the parsing was successful. No truncated data was seen.. – user7375520 Jun 22 '17 at 21:30
-
Unix (Linux) doesn't care about control-M characters; they're not needed, but they don't cause odd behaviours. MCVE time, I think. Even then, we may not be able to crack the problem from here. You'd have to produce some pretty convincing evidence that there's something weird with your file, and the weirdness would likely be "disk drive failing" type weirdness rather than 'unexpected content'. I'm mildly curious how you determined that there were no null (zero) bytes in the data — how did you establish that? – Jonathan Leffler Jun 22 '17 at 21:37
-
Have you done: `size_t len = strlen(buffer); printf("len = %zu; c = %d (%c)\n", len, buffer[len], buffer[len]);` at the point where you find the problem? What information is printed from that? The answer should include `c = 0 ()` with nothing visible in the parentheses, unless your terminal displays null bytes specially. The actual length should be the value less than 900 KiB that you've been seeing all along, of course. If the length reported by `fread()` is longer than that, then your file contains null bytes, or your code is trampling in the middle of your buffer. – Jonathan Leffler Jun 22 '17 at 21:37
2 Answers
Your main question has already been answered, though it's worth notice that strlen is not designed to measure the size of an array but a NULL-terminated string. It probably prints a lower value because strlen returns the number of characters that appear before a null-char, so if you have nullchars ('\0') through your data, strlen will stop as soon as it finds one of them.
You should trust fread 's return value.
EDIT: as a note, fread MAY read less bytes than requested, and it can be caused by an error or an end of file. You can check it with ferror and feof, respectively.

- 105
- 1
- 8
-
Ok. I got the point not to compare it with strlen() as it may contain binary data. But my this question has not been answered: why the data poplulated in buffer is like this at the end: 06/21/17 21:41:21 paper..." (truncated) – user7375520 Jun 22 '17 at 20:33
strlen
does something along these lines:
int strlen(char *str)
{
int len = 0;
while(*str++) len++;
return len;
}
If your file contains binary data (or if it's a text file with a UTF encoding and unused upper bytes) strlen
is going to stop at the first 0x00
byte it encounters and return how many bytes into the file that was encountered. If you read a text file in a single-byte encoding like ANSI there won't be a null terminator and calling strlen
will invoke undefined behavior.
If you want to determine how many bytes that fread
successfully read out of the file, check its return value.1
If you want to determine the file size before reading a file, do this:
size_t len;
fseek(fp, 0, SEEK_END);
len = ftell(fp);
rewind(fp);
len
will contain the file's size in bytes.
1: Assuming you called fread
with parameter 2 set to 1 byte per element and didn't try to read more bytes than are actually in the file.

- 20,656
- 7
- 53
- 85
-
I did exactly the same to know the file size. I also got the point not to compare it with strlen() as it may contain binary data. But my this question has not been answered: why the data poplulated in buffer is like this at the end: 06/21/17 21:41:21 paper..." (truncated) – user7375520 Jun 22 '17 at 20:27