13

I am reading a file containing around 50k lines using the file() function in Php. However, its giving a out of memory error since the contents of the file are stored in the memory as an array. Is there any other way?

Also, the lengths of the lines stored are variable.

Here's the code. Also the file is 700kB not mB.

private static function readScoreFile($scoreFile)
{
    $file = file($scoreFile);
    $relations = array();

    for($i = 1; $i < count($file); $i++)
    {
        $relation = explode("\t",trim($file[$i]));
        $relation = array(
                        'pwId_1' => $relation[0],
                        'pwId_2' => $relation[1],
                        'score' => $relation[2],
                        );
        if($relation['score'] > 0)
        {
            $relations[] = $relation;
        }
    }

    unset($file);
    return $relations;
}
tshepang
  • 12,111
  • 21
  • 91
  • 136
Chetan
  • 4,885
  • 4
  • 22
  • 15
  • I know this question is old but two things here. 1. read the file line by line. 2. The out of memory error could be that you are storing everything in an array as well, typically not a good idea without some sort of control and knowledge of memory you have – Atherion Mar 25 '13 at 17:31

5 Answers5

13

Use fopen, fread and fclose to read a file sequentially:

$handle = fopen($filename, 'r');
if ($handle) {
    while (!feof($handle)) {
        echo fread($handle, 8192);
    }
    fclose($handle);
}
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • this doesnt works, i want to read line by line. Its returning mutliple lines on each fread (i guess 8192 bytes) – Chetan Jul 03 '10 at 11:07
  • 7
    replace fread with "fgets": fgets — Gets line from file pointer – Killer_X Jul 03 '10 at 11:12
  • You can use an intermediate variable $line to store the bytes of each line, and then echo $line. fread is probably one of the most efficient way to stream the file, so read the results of fread(and append to $line) until you find a line break. Then do whatever you want with that line, then set $line="", and resume appending the results of fread to $line. – luiscubal Jul 03 '10 at 11:15
  • The issue is, the length of the lines are variable. So, at some places I get like half the lines – Chetan Jul 03 '10 at 11:23
  • IMO it's not good approach to use fread due to different length of the lines in the file – Roman Podlinov Jan 13 '14 at 19:06
10

EDIT after update of question and comments to answer of fabjoa:

There is definitely something fishy if a 700kb file eats up 140MB of memory with that code you gave (you could unset $relation at the end of the each iteration though). Consider using a debugger to step through it to see what happens. You might also want to consider rewriting the code to use SplFileObject's CSV functions as well (or their procedural cousins)

SplFileObject::setCsvControl example

$file = new SplFileObject("data.csv");
$file->setFlags(SplFileObject::READ_CSV);
$file->setCsvControl('|');
foreach ($file as $row) {
    list ($fruit, $quantity) = $row;
    // Do something with values
}

For an OOP approach to iterate over the file, try SplFileObject:

SplFileObject::fgets example

$file = new SplFileObject("file.txt");
while (!$file->eof()) {
    echo $file->fgets();
}

SplFileObject::next example

// Read through file line by line
$file = new SplFileObject("misc.txt");
while (!$file->eof()) {
    echo $file->current();
    $file->next();
}

or even

foreach(new SplFileObject("misc.txt") as $line) {
    echo $line;
}

Pretty much related (if not duplicate):

Community
  • 1
  • 1
Gordon
  • 312,688
  • 75
  • 539
  • 559
  • I think this still can potentially use a big chunk of memory, as I think it continues to read until it finds an end-of-line. – Artefacto Jul 03 '10 at 11:00
  • @Artefacto well, you can still use `SplFileObject::setMaxLineLen` if that is an issue. – Gordon Jul 03 '10 at 11:09
  • @Gordon Right. I see my familiarity with SplFileObject could be improved :p – Artefacto Jul 03 '10 at 11:13
  • @Gordon Why not foreach then? `foreach (new SplFileObject("misc.txt") as $line) { ... }` – Artefacto Jul 03 '10 at 11:16
  • @Artefacto because I am lazily copying the examples from the PHP Manual ;) – Gordon Jul 03 '10 at 11:21
  • @salathe It's too hot not to be. Add better examples to the docs ;) (j/k) – Gordon Jul 03 '10 at 12:01
  • @Gordon, I agree with the too hot (as good an excuse as any)!! :-) – salathe Jul 03 '10 at 12:47
  • @Gordon (and @Chetan), since the file contains TSV then the "CSV" reading capabilities of `SplFileObject` might be of some practical use. :) – salathe Jul 03 '10 at 12:50
  • `SplFileObject`+`fread` uses more memory than `fopen`+`fread`, even more than `file_get_contents()`. I test to gets all file content at once and put into variable and `SplFileObject` consume the most memory. – vee May 30 '17 at 08:58
  • @vee if you put all the file content into a variable, that's missing the point of using SplFileObject.The benefit of iterating with SplFileContent is that it's using a streaming approach, e.g. you are only loading one line into memory *during* the iteration. Obviously, if you store all those lines into a variable, you will end up with roughly the same memory usage, given and take the object overhead. – Gordon May 30 '17 at 09:36
1

If you don't know the maximum line length and you are not comfortable to use a magic number for the max line length then you'll need to do an initial scan of the file and determine the max line length.

Other than that the following code should help you out:

    // length is a large number or calculated from an initial file scan
    while (!feof($handle)) {
        $buffer = fgets($handle, $length);
        echo $buffer;
    }
zaf
  • 22,776
  • 12
  • 65
  • 95
1

Old question but since I haven't seen anyone mentioning it, PHP generators is a great way to reduce save memory consumption.

For example:

function read($fileName)
{
    $fileHandler = fopen($fileName, 'rb');

    while(($line = fgets($fileHandler)) !== false) {
        yield rtrim($line, "\r\n");
    }

    fclose($fileHandler);
}

foreach(read(__DIR__ . '/filenameHere') as $line) {
    echo $line;
}
lloiacono
  • 4,714
  • 2
  • 30
  • 46
0

allocate more memory during the operation, maybe something like ini_set('memory_limit', '16M');. Don't forget to go back to initial memory allocation once operation is done

fabjoa
  • 1,615
  • 3
  • 15
  • 15
  • I'm pretty sure that you don't have to reset the memory limit after the operation, it only applies to the currently running script. – George Marian Jul 03 '10 at 11:19
  • I am already using 140MB of memory (there is a lot of stuff going on part from reading the file) – Chetan Jul 03 '10 at 11:21
  • 1
    @Chetan this sounds fishy to me. 50k lines aint that much. The [King James Bible](http://www.gutenberg.org/etext/26361) has around 20k lines, is 1MB in plain text and only takes up about ~3MB when read in with file(). What is the total size in Bytes of your file? – Gordon Jul 03 '10 at 11:39
  • @Gordon The file is like 700 MB, however its a TSV file, after reading the file, I am splitting each line and storing it into an array. So thats like an array of 30k X 5, which is why its taking so much memory, I guess – Chetan Jul 03 '10 at 12:15
  • @Chetan are you sure you are not leaking any memory somewhere? Try unsetting unused variables, especially while looping. Maybe you can post some of your code for us to see. – Gordon Jul 03 '10 at 12:23
  • @Chetan thanks. There is definitely something wrong if the file is just 700kb though. – Gordon Jul 03 '10 at 12:42