0

I need to read a xlsx file with 10 sheets, each sheet with about 3K rows.

Is there a way to loop each sheet and chunk his rows?

Following the examples I'm on this point:

public function import($file)
{
    $inputFileType = IOFactory::identify($file);
    $reader = IOFactory::createReader($inputFileType);

    //My ChunkReadFilter is exactly the same of the PhpSpreadsheet examples
    $chunkFilter = new ChunkReadFilter();
    $reader->setReadFilter($chunkFilter);

    $chunkSize = 100;

    $spreadsheet = $reader->load($file);

    $loadedSheetNames = $spreadsheet->getSheetNames();

    foreach ($loadedSheetNames as $sheetIndex => $loadedSheetName) {
        $sheet = $spreadsheet->getSheet($sheetIndex);

        //$highestRow = $sheet->getHighestRow(); //Is returning 1 as result
        $highestRow = 3000;

        for ($startRow = 1; $startRow <= $highestRow; $startRow += $chunkSize) {
            /**  Tell the Read Filter which rows we want this iteration  **/
            $chunkFilter->setRows($startRow, $chunkSize);

            $sheetData = $sheet->toArray(null, true, false, true);
            var_dump($sheetData);
        }

    }
}

The var_dump($sheetData); prints all sheet data, not only the chunk size.

So, how can I read each sheet data and chunk the rows?

I'm using "phpoffice/phpspreadsheet": "^1.4"

Victor
  • 5,043
  • 3
  • 41
  • 55

2 Answers2

2

I completely missed your goal (the question was not so clear). I completely change my answer. Assumed that you can loop through multiple sheets with the code below:

// .... add helper here....
$helper->log('Loading file ' . pathinfo($inputFileName, PATHINFO_BASENAME) . ' using IOFactory with a defined reader type of ' . $inputFileType);
$reader = IOFactory::createReader($inputFileType);

// Define how many rows we want for each "chunk"
$chunkSize = 10;

// Loop to read our worksheet in "chunk size" blocks
for ($startRow = 2; $startRow <= 50 ; $startRow += $chunkSize) {
    // ..... use the helper ...
    $helper->log('Loading WorkSheet using configurable filter for headings row 1 and for rows ' . $startRow . ' to ' . ($startRow + $chunkSize - 1));
    // Create a new Instance of our Read Filter, passing in the limits on which rows we want to read
    $chunkFilter = new ChunkReadFilter($startRow, $chunkSize);
    // Tell the Reader that we want to use the new Read Filter that we've just Instantiated
    $reader->setReadFilter($chunkFilter);
    // Load only the rows that match our filter from $inputFileName to a PhpSpreadsheet Object
    $spreadsheet = $reader->load($inputFileName);

    $sheetCount = $spreadsheet->getSheetCount();

    for ($i = 0; $i < $sheetCount; $i++) {
        $sheet = $spreadsheet->getSheet($i);

        // ...not what you want, but I leave this here
        $higestRow = $sheet->getHighestRow();
        echo "<p> Sheet n. ".$i. "  highest row is:" . ($higestRow) . "</p>";

        $sheetData = $sheet->toArray(null, true, true, true);

        var_dump($sheetData);
    }
}

...to reach your goal I guess you need to call use PhpOffice\PhpSpreadsheet\Reader\IReadFilter; and build your own filter in order to set the highestRow inside the for loop, as for your needs. This code is taken from the documentation, the poblic function setRows() I guess is where you need to put your own code, and than cal the filter in the for loop:

namespace Samples\Sample12;

use PhpOffice\PhpSpreadsheet\IOFactory;
use PhpOffice\PhpSpreadsheet\Reader\IReadFilter;

require __DIR__ . '/../Header.php';

$inputFileType = 'Xls';
$inputFileName = __DIR__ . '/sampleData/example2.xls';

/**  Define a Read Filter class implementing IReadFilter  */
class ChunkReadFilter implements IReadFilter
{
    private $startRow = 0;

    private $endRow = 0;

/**
 * Set the list of rows that we want to read.
 *
 * @param mixed $startRow
 * @param mixed $chunkSize
 */
public function setRows($startRow, $chunkSize)
{
    $this->startRow = $startRow;
    $this->endRow = $startRow + $chunkSize;
}

public function readCell($column, $row, $worksheetName = '')
{
    //  Only read the heading row, and the rows that are configured in            $this->_startRow and $this->_endRow
    if (($row == 1) || ($row >= $this->startRow && $row <   $this->endRow)) {
        return true;
    }

    return false;
    }
}

$helper->log('Loading file ' . pathinfo($inputFileName, PATHINFO_BASENAME) . ' using IOFactory with a defined reader type of ' . $inputFileType);
// Create a new Reader of the type defined in $inputFileType
$reader = IOFactory::createReader($inputFileType);

// Define how many rows we want to read for each "chunk"
$chunkSize = 10;
// Create a new Instance of our Read Filter
$chunkFilter = new ChunkReadFilter();

// Tell the Reader that we want to use the Read Filter that we've  Instantiated
$reader->setReadFilter($chunkFilter);

$spreadsheet = $reader->load($inputFileName);

$sheetCount = $spreadsheet->getSheetCount();

for ($i = 0; $i < $sheetCount; $i++) {
    $sheet = $spreadsheet->getSheet($i);
    // ...we get the highest row here, now
    $higestRow = $sheet->getHighestRow();

    for ($startRow = 2; $startRow <= $higestRow; $startRow += $chunkSize) {
        // ..just for check the output
        echo "<p> Sheet n. ".$i. "  highest row is:" . ($higestRow) . "</p>";
        $helper->log('Loading WorkSheet using configurable filter for headings row 1 and for rows ' . $startRow . ' to ' . ($higestRow + $chunkSize - 1));
        // Tell the Read Filter, the limits on which rows we want to read this iteration
        $chunkFilter->setRows($startRow, $chunkSize);
        // Load only the rows that match our filter from $inputFileName to a PhpSpreadsheet Object
        $spreadsheet = $reader->load($inputFileName);

        // Do some processing here

        $sheetData = $spreadsheet->getActiveSheet()->toArray(null, true, true, true);
        var_dump($sheetData);
    }

}
uomopalese
  • 471
  • 1
  • 5
  • 21
  • Ok, but how to get the number of rows to do the first loop? – Victor Aug 28 '18 at 15:31
  • I tried with `$sheet->getHighestRow()` but it is always return 1, and to do this I need to call `getSheet` before, right? – Victor Aug 28 '18 at 15:37
  • I updated my code, I'm not sure to understamnd correctly what you want to get – uomopalese Aug 28 '18 at 18:09
  • Y're doing a loop from row 2 to row 50, but I don't know how many rows there are on each sheet, so I can't let this number fixed. I tried with `$sheet->getHighestRow()`, but because of the `ChunkReadFilter` it is returning the wrong number of rows, and with your code I did not get a solution to fix this – Victor Aug 28 '18 at 19:02
  • Ok, maybe now I understand. Updated my code, put `$higestRow = $sheet->getHighestRow()` in that position. You will see the number of highest row on top of each block of code – uomopalese Aug 28 '18 at 19:18
  • Updated my answer, I missed your intentions, but the question was not so clear ;) – uomopalese Aug 29 '18 at 19:28
  • The unique problem with your new example, is the number of rows that are fixed, e.g. `for ($startRow = 2; $startRow <= 240; $startRow += $chunkSize) {...`. The number of rows in my case must be the highest row of each sheet, not 240. I have sheets with more than 3K rows and others with 100 rows, so this number can't be fixed. – Victor Aug 30 '18 at 12:29
  • I see, check now, i nested the two for loops; this seems to work, maybe you can write it in a better way. Note that the last output will be the `$higestRow + $chunkSize` – uomopalese Aug 31 '18 at 13:42
1

I am still new to this but tried out a solution which helps us here:

We can read files in Chunk through excel sheet as mentioned in the above comments but to save memory. We can create reader inside the loop and release it at the end of the loop like as mentioned below:

// Define how many rows we want to read for each "chunk"
$chunkSize = 1000;      

// Loop to read our worksheet in "chunk size" blocks
for ($startRow = 1; $startRow <= $rawRows; $startRow += $chunkSize) {
// Create a new Reader of the type defined in
$reader = IOFactory::createReader($inputFileType);

// Create a new Instance of our Read Filter
$chunkFilter = new Chunk();

// Tell the Reader that we want to use the Read Filter that we've Instantiated
$reader->setReadFilter($chunkFilter);

// Tell the Read Filter, the limits on which rows we want to read this iteration
$chunkFilter->setRows($startRow, $chunkSize);
// Load only the rows that match our filter from $inputFileName to a PhpSpreadsheet Object
$spreadsheet = $reader->load($inputFileName);
.....
// process the file
.....

// then release the memory
$spreadsheet->__destruct();
$spreadsheet = null;
unset($spreadsheet);

$reader->__destruct();
$reader = null;
unset($reader);
}

It helps for large sheets to use only memory of a chunk and never exceed the memory limit.

Please let me know if this is helpful.

Łukasz D. Tulikowski
  • 1,440
  • 1
  • 17
  • 36