I have been working on a strange PHP problem the last few days where the feof() function is returning true before the end of a file. Below is a skeleton of my code:
$this->fh = fopen("bigfile.txt", "r");
while(!feof($this->fh))
{
$dataString = fgets($this->fh);
if($dataString === false && !feof($this->fh))
{
echo "Error reading file besides EOF";
}
elseif($dataString === false && feof($this->fh))
{
echo "We are at the end of the file.\n";
//check status of the stream
$meta = stream_get_meta_data($this->fh);
var_dump($meta);
}
else
{
//else all is good, process line read in
}
}
Through lots of testing I have found that the program works fine on everything except one file:
- The file is stored on the local drive.
- This file is around 8 million lines long averaging somewhere around 200-500 characters per line.
- It has already been cleaned and under close examination with a hex editor, no abnormal characters have been found.
- The program consistently fails on line 7172714 when it believes it has reached the end of the file (even though it has ~800K lines left).
- I have tested the program on files that had fewer characters per line but were between 20-30 million lines with no problems.
- I tried running the code from a comment on http://php.net/manual/en/function.fgets.php just to see if it was something in my code that was causing the issue and the 3rd party code failed on the same line. EDIT: also worth mentioning is that the 3rd party code used fread() instead of fgets().
- I tried specifying several buffer sizes in the fgets function and none of them made any difference.
The output from the var_dump($meta) is as follows:
array(9) {
["wrapper_type"]=>
string(9) "plainfile"
["stream_type"]=>
string(5) "STDIO"
["mode"]=>
string(1) "r"
["unread_bytes"]=>
int(0)
["seekable"]=>
bool(true)
["uri"]=>
string(65) "full path of file being read"
["timed_out"]=>
bool(false)
["blocked"]=>
bool(true)
["eof"]=>
bool(true)
}
In attempting to find out what is causing feof to return true before the end of the file I have to guess that either:
A) Something is causing the fopen stream to fail and then nothing is able to be read in (causing feof to return true)
B) There is some buffer somewhere that is filling up and causing havoc
C) The PHP gods are angry
I have searched far and wide to see if anyone else was having this issue and cannot find any instances except in C++ where the file was being read in via text mode instead of binary mode and was causing the issue.
UPDATE: I had my script constantly output the number of times the read function had iterated and the unique ID of the user associated with the entry it found beside it. The script is still failing after line 7172713 out of 7175502, but the unique ID of the last user in the file is showing up on line 7172713. It seems that the problem is for some reason lines are being skipped and are not read. All line breaks are present.