Like many others, I have been struggling with PHPExcel memory usage when reading a file (to convert it to MySQL).
Of course I have tried the usual things mentioned in various places and have been able to increase memory efficiency by at least 40%. This includes using a custom chunked reader class, moving chunked reader instantiation outside of the read loop, etc.
I have 16G of RAM on my test server and allocated a 2G max memory usage in PHP. For files under ~200K rows, PHPExcel will work (slowly but surely). Once over a certain size the script fails simply outputting "Killed" to the shell. The logs showed that the kernel killed PHP because it had used too much memory. While watching CPU and memory usage using the top command I can see memory free and swap free plummet while memory used and swap used skyrocket.
After reading a LOT about PHPExcel and looking at some of the source files I have come to the conclusion that a lot of data is stored about each cell that is not needed when processing only text. Using:
$objReader->setReadDataOnly(true);
helps a bit, but doesn't really do that much... However, using a chunked reader and setting the chunk size to something small then using unset() to clean up big variables should theoretically work. I know that PHPExcel must read the whole file each time, but it shouldn't be storing it in memory right?
Here is the code I am currently using:
<?php
date_default_timezone_set("America/New_York");
set_time_limit(7200);
ini_set('memory_limit', '2048M');
include_once("classes/PHPExcel/PHPExcel/IOFactory.php");
$inputFileName = "/PATH/TO/FILE.xlsx";
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
$worksheetName = "Sheet1";
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
private $_startRow = 0;
private $_endRow = 0;
public function __construct($startRow, $chunkSize)
{
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function setRows($startRow, $chunkSize)
{
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '')
{
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow))
{
return true;
}
return false;
}
}
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$objReader->setReadDataOnly(true);
$chunkSize = 1000;
echo "Got here 1\n";
$chunkFilter = new chunkReadFilter(2,$chunkSize);
for ($startRow = 2; $startRow <= 378767; $startRow += $chunkSize)
{
$chunkFilter->setRows($startRow, $chunkSize);
$objReader->setReadFilter($chunkFilter);
echo "Got here 2\n";
$objPHPExcel = $objReader->load($inputFileName);
echo "Got here 3\n";
$sheet = $objPHPExcel->getSheetByName($worksheetName);
echo "Got here 4\n";
$highestRow = $sheet->getHighestRow();
$highestColumn = $sheet->getHighestColumn();
echo "Got here 5\n";
$sheetData = $sheet->rangeToArray("A".$startRow.":".$highestColumn.$highestRow, NULL, TRUE, FALSE);
print_r($sheetData);
echo "\n\n";
}
?>
Which outputs:
[USER@BOX Directory]# php PhpExcelBigFileTest.php
Got here 1
Got here 2
Killed
This leads to the question: Is PHPExcel trying to load the whole file into memory regardless of my filter? If it is, why is PHP not stopping it at 2G memory usage, but allowing it to get so bad the kernel has to kill PHP?