0

3 days crashing my head towards a wall.

I developed a php script for import big text files and populate mysql database. Until i get 2 million records it works perfectly but i need to import like 10 million rows divided in different files.

My application scans files in a folder, get file extension (i have 4 kind of procedures import for 4 different extensions) and call the relative import function.

I have a structure made of theese classes:

CLASS SUBJECT1{ public function import_data_1(){
    __DESTRUCT(){$this->childObject = null;}
    IMPORT SUBJECT1(){
    //fopen($file);
    //ob_start();
    //PDO::BeginTransaction();
    //WHILE (FILE) {
    //PREPARED STATEMENT        
    //FILE READING
    //GET FILE LINE
    //EXECUTE INSERT
    //} END WHILE
    //PDO::Commit();
    //ob_clean(); or ob_flush();
    //fclose($file);
    //clearstatcache();
   }
};}

CLASS SUBJECT2{ same as SUBJECT1;}

CLASS SUBJECT3{ same as SUBJECT1;}

CLASS SUBJECT4{ same as SUBJECT1;}

and the main class that launches the procedure:

CLASS MAIN{
   switch($ext)
     case "ext1":
        $SUBJECT1 = new SUBJECT1();
        IMPORT_SUBJECT1();
        unset $SUBJECT1;
        $SUBJECT1 = null;
        break;
     case "ext2": //SAME AS CASE ext1 WITH IMPORT_SUBJECT2();
     case "ext3": //SAME AS CASE ext1 WITH IMPORT_SUBJECT3();
     case "ext4": //SAME AS CASE ext1 WITH IMPORT_SUBJECT4();

}

It works perfectly with some adjustement of mysql file buffers (ib_logfile0 and ib_logfile1 are set as 512Mb).

The problem is that everytime a procedure is terminated php does not free memory. I'm sure that destructor is called (i put an echo inside __destruct method) and the object is not accesible (var_dump say is NULL). I tried so many ways to free memory but now i'm at a dead point.

I also verified gc_collect_cycles() in many different point of code and it always says 0 cycles so all abject are not referenced each other. I tried even to delete class structure and call all the code sequential but i always get this error:

Fatal error: Out of memory (allocated 511180800) (tried to allocate 576 bytes) in C:\php\index.php on line 219 (line 219 is execute of a PS on the 13th file).

The memory is used in this way:

  • php script: 52MB
  • end first file import :110MB
  • destructors and unset calling: 110MB
  • new procedure calling: 110MB
  • end second file import 250MB
  • destructors and unset calling: 250MB
  • new procedure calling: 250MB

So as you can see even unsetting objects they don't free memory.

I tried setting php ini memory size to 1024M but it grows up really fast and crashes after 20 files.

Any advice?

Many thanks!

EDIT 1:

posting code:

class SUBJECT1{

    public function __destruct()
    {
        echo 'destroying subject1 <br/>';
    }

    public function import_subject1($file,$par1,$par2){
        global $pdo;

        $aux            = new AUX();
        $log            = new LOG();

// ---------------- FILES  ----------------
        $input_file    = fopen($file, "r");

// ---------------- PREPARED STATEMENT  ----------------

$PS_insert_data1= $pdo->prepare("INSERT INTO table (ID,PAR1,PAR2,PARN) VALUES (?,?,?,?) ON DUPLICATE KEY UPDATE ID = VALUES(ID), PAR1 = VALUES(PAR1), PAR2 = VALUES(PAR2), PAR3 = VALUES(PAR3), PARN = VALUES(PARN)");

$PS_insert_data2= $pdo->prepare("INSERT INTO table (ID,PAR1,PAR2,PARN) VALUES (?,?,?,?) ON DUPLICATE KEY UPDATE ID = VALUES(ID), PAR1 = VALUES(PAR1), PAR2 = VALUES(PAR2), PAR3 = VALUES(PAR3), PARN = VALUES(PARN)");

//IMPORT
if ($input_file) {
  ob_start();
  $pdo->beginTransaction();
  while (($line = fgets($input_file)) !== false) {
  $line = utf8_encode($line);
  $array_line = explode("|", $line);
  //set null values where i neeed
  $array_line = $aux->null_value($array_line);

  if(sizeof($array_line)>32){    
     if(!empty($array_line[25])){
          $PS_insert_data1->execute($array_line[0],$array_line[1],$array_line[2],$array_line[5]);
     }

  $PS_insert_data2->execute($array_line[10],$array_line[11],$array_line[12],$array_line[15]);
  }

$pdo->commit();    
flush();
ob_clean();
fclose($f_titolarita);
clearstatcache();
}

I do this iterative for all files of my folder, the other procedures are the same concept. I still have increase of memory and now it crashes with a white page response :-\

SBO
  • 623
  • 2
  • 8
  • 22

2 Answers2

1

Personally, I would go slightly different about it. These are the steps I would do:

  • Open a PDO connection, set PDO in Exception mode
  • Get a list of files that I want to read
  • Create a class that can utilize PDO and the list of files and perform insertions
  • Prepare the statement ONCE, utilize it many times
  • Chunk PDO transaction commits to 50 (configurable) inserts - this means that every 50th time I call $stmt->execute(), I issue a commit - which utilizes the HDD better thus making it faster
  • Read each file line by line
  • Parse the line and check if it's valid
  • If yes, add to MySQL, if not - report an error

Now, I've created 2 classes and example on how I'd go about it. I tested only up to the reading part since I don't know your DB structure nor what AUX() does.

class ImportFiles
{
    protected $pdo;
    protected $statements;
    protected $transaction = false;
    protected $trx_flush_count = 50; // Commit the transaction at every 50 iterations

    public function __construct(PDO $pdo = null)
    {
        $this->pdo = $pdo;

        $this->stmt = $this->pdo->prepare("INSERT INTO table 
                                                (ID,PAR1,PAR2,PARN)
                                            VALUES 
                                                (?,?,?,?) 
                                            ON DUPLICATE KEY UPDATE ID = VALUES(ID), PAR1 = VALUES(PAR1), PAR2 = VALUES(PAR2), PAR3 = VALUES(PAR3), PARN = VALUES(PARN)");
    }

    public function import($file)
    {
        if($this->isReadable($file))
        {
            $file = new FileParser($file);

            $this->insert($file);
        }
        else
        {
            printf("\nSpecified file is not readable: %s", $file);
        }
    }

    protected function isReadable($file)
    {
        return (is_file($file) && is_readable($file));
    }   

    protected function insert(FileParser $file)
    {
        while($file->read())
        {
            //printf("\nLine %d, value: %s", $file->getLineCount(), $file->getLine());

            $this->insertRecord($file);

            $this->flush($file);
        }

        $this->flush(null);
    }

    // Untested method, no idea whether it does its job or not - might fail
    protected function flush(FileParser $file = null)
    {
        if(!($file->getLineCount() % 50) && !is_null($file))
        {
            if($this->pdo->inTransaction())
            {
                $this->pdo->commit();

                $this->pdo->beginTransaction();
            }
        }
        else
        {
            if($this->pdo->inTransaction())
            {
                $this->pdo->commit();
            }
        }
    }   

    protected function insertRecord(FileParser $file)
    {
        $check_value = $file->getParsedLine(25);

        if(!empty($check_value))
        {
            $values = [ 
                $file->getParsedLine[0],
                $file->getParsedLine[1],
                $file->getParsedLine[2],
                $file->getParsedLine[5]
            ];
        }
        else
        {
            $values = [ 
                $file->getParsedLine[10],
                $file->getParsedLine[11],
                $file->getParsedLine[12],
                $file->getParsedLine[15]
            ];      
        }

        $this->stmt->execute($values);
    }
}

class FileParser
{
    protected $fh;
    protected $lineCount = 0;
    protected $line = null;
    protected $aux;

    public function __construct($file)
    {
        $this->fh = fopen($file, 'r');
    }

    public function read()
    {
        $this->line = fgets($this->fh);

        if($this->line !== false) $this->lineCount++;

        return $this->line;
    }

    public function getLineCount()
    {
        return $this->lineCount;
    }

    public function getLine()
    {
        return $this->line;
    }

    public function getParsedLine($index = null)
    {
        $line = $this->line;

        if(!is_null($line))
        {
            $line = utf8_encode($line);
            $array_line = explode("|", $line);

            //set null values where i neeed
            $aux = $this->getAUX();
            $array_line = $aux->null_value($array_line);

            if(sizeof($array_line) > 32)
            {   
                return is_null($index) ? $array_line : isset($array_line[$index]) ? $array_line[$index] : null;
            }
            else
            {
                throw new \Exception(sprintf("Invalid array size, expected > 32 got: %s", sizeof($array_line)));
            }
        }
        else
        {
            return [];
        }
    }

    protected function getAUX()
    {
        if(is_null($this->aux))
        {
            $this->aux = new AUX();
        }

        return $this->aux;
    }
}

Usage:

$dsn = 'mysql:dbname=testdb;host=127.0.0.1';
$user = 'dbuser';
$password = 'dbpass';

try 
{
    $pdo = new PDO($dsn, $user, $password);

    $pdo->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);

    $import = new ImportFiles($pdo);

    $files = ['/usr/local/file1.txt', '/usr/local/file2.txt'];

    foreach($files as $file)
    {
        $import->import($file);
    }

} catch (Exception $e) 
{
    printf("\nError: %s", $e->getMessage());
    printf("\nFile: %s", $e->getFile());
    printf("\nLine: %s", $e->getLine());
}
N.B.
  • 13,688
  • 3
  • 45
  • 55
  • Uhm, i do pretty the same, with the exception that i use fopen with fgets to get lines instead of the FileParser. I use the same pdo connection included in the index file and prepare ONCE statements iterating execution in the while cycle. I didn't report in my pseudocode exception and log tracking but i have procedures in case of fail. I dont make commit every 50 records because is slower and the problem is not on the file size because crashes after 30-40 files. i Really don't know why memory is growing up (now slower than before) but i have to import ~10Million records. Heading a wall. – SBO Dec 16 '14 at 13:51
  • You're reading, parsing and committing an entire file. Your method for preparing statements prepares the same query twice. If you are invoking the `import_subject1` for each file, this happens for each file (preparing the statement). Now, if your file is, say 100 MB - you'll have that much data you're looking to flush in a single commit. What I've done different is chosen to have a rate at which I'm flushing. I constructed the code differently to have clear separations of what does what. Anyway, reading line by line and flushing 50 at once instead everything saves memory. – N.B. Dec 16 '14 at 15:03
  • Why you say thai i prepared statement twice? For abstract separations i have other problems due to different Prepared_statement because when i parse every line one bit tell me which PS i hae to use. Anyway, Cani flush PS manually? Really thanks for your help! – SBO Dec 16 '14 at 16:25
  • If I'm not reading wrong, you prepare the exact same thing in `$PS_insert_data1` and `$PS_insert_data2`. This preparing is done every time you invoke `import_subject1` method. Statements have to be prepared once, used multiple times. You're also using a global variable for PDO instead of just injecting it into the class / method. Anyway, you can't flush a prepared statement manually. When I say flush for a prepared statement, I mean a commit. – N.B. Dec 16 '14 at 16:30
  • Well, in pseudo code PS_insert_data1 and PS_insert_data2 have same data inside but parN means n-parameters more, they are obviously different statement. About global variable you're right but i have always seen using global $pdo in examples includin in connection file. So please, can you write me a simple example how to use pdo not as global variable? because from your multiple calsses in answer i can't get out! Really many thanks for your time! – SBO Dec 17 '14 at 09:24
  • Well, I can't know whether it is pseudo code or not if it's not explicitly mentioned :) however, the fact remains that **for every** `import_subject` method invocation, you do prepare two statements when you don't have to - you should do it **only once**. Also, your transactions aren't committed until the file has been read fully. You can imagine what happens if you have, say, 100mb file. It's not for no reason that PHP allocates memory. Chunking these flushes (commits) helps with memory allocation. I would at least try the chunked commit approach to see if the memory goes down. – N.B. Dec 17 '14 at 09:28
  • Well, maybe i didn't understood your answer before but i mean, for the same record i have to populate 4 tables so i need 4 prepared statement, isn't it? – SBO Dec 17 '14 at 09:30
  • P.s. i need to get files in the path order, i can't process all .ext1 then .ext2 and so on. This is ewhy i build object and destroy it every case. – SBO Dec 17 '14 at 09:32
  • Your question says that you deal with multiple files. If these files insert into the same tables, then you don't have to prepare the statement for each file. You even import the PDO object globally. You can prepare the statements *before* creating `SUBJECT1` object, save the statements into an array and pass it as a reference to your import method. The key here is that statements are **prepared once, executed multiple times** - that should be your mantra. – N.B. Dec 17 '14 at 09:34
  • No, i have 4 kind of file extension. Every file populate 4 diffenrent tables so i have 16 PS in total. I populate statement once at the beginning but the object wasn't destroyed so i put it in the swith/case to automatic flush it when out of case/switch scope. I think that (and you're surely right) the problem here is the global pdo that is never flushed. But i couldn't understand your code when you put it in the constructor. can you please explain me? i will send you a beer for sure! – SBO Dec 17 '14 at 09:41
  • I passed the PDO object to the class. That's so you don't have to use `global $pdo` inside your method. The point of a class is to assign a job to it and it does that one job only, which is why I have 2 classes - one to parse the file, one to insert into the db. The import class depends on database connection - so I injected the pdo via constructor. Also, I prepared the statement(s) I want to use there - this is so I can prepare once, use multiple times. This means you need 1 class for many files, without needing to destroy anything. – N.B. Dec 17 '14 at 11:02
  • Now what happens is that you have code that connects to the database, code that finds the file, code that parses the file according to some rules and code that inserts the data if all the rules are satisfied. When you debug or optimize, you have clear separation of tasks (connect, find files, parse files, insert to db) and you can handle / debug in an easier way. – N.B. Dec 17 '14 at 11:04
  • Ok all clear,. But following your code when i create an object = new object(); and in the __construct(PDO $pdo=null){..} the $pdo is always null. I scout for tutorial and it should work but i can only make it works if i create the object passing $pdo as parameter in object= new object($pdo); It seems that the pdo is not alive in the class file. what i'm doing wrong?? – SBO Dec 17 '14 at 11:30
  • No, it doesn't mean that PDO is null. It means that if no PDO instance is passed, a null will be used in its place. It's called default parameter setting. – N.B. Dec 17 '14 at 13:26
  • Yeah, in your code then there is an error? shouldn't you call new ImportFiles($pdo); ? – SBO Dec 22 '14 at 15:00
  • 1
    You're correct, there's an error - I should have called it like you mentioned. Editing the answer so it's clear that it should be used that way :) – N.B. Dec 22 '14 at 15:15
  • I didn't solved all problems but i have to leave this project for a bit. Thanks a lot for your help, really useful! – SBO Dec 22 '14 at 17:26
0

SOLVED:

i did this approach, maybe is useful for someone who has similar problem:

I opened task manager and looked at memory usage for apache and mysql processes with these cases:

  • Tried to read and elaborate files without calling MySql procedures (memory usage was ok)
  • Tried to read, elaborate and inserting in db just files with extension one by one (all .ext1, all .ext2, ....)
  • Debugged the procedure with big memory encreasing isolating functions one by one finding the problematic one.
  • Found the problem and solved

The problem was that i called a function passing as parameter the Prepared Statement. I thought that, once prepared, it was just a "static" object to call. What happens is that if you pass the same PS in a function the memory grows up exponentially.

Hope this helps to someone.

Bye!

SBO
  • 623
  • 2
  • 8
  • 22
  • You should pass PDOStatement object by reference so it doesn't get copied. Congratulations on managing this on your own though. – N.B. Dec 16 '14 at 10:31
  • N.B. thanks for comment, How to do that? you mean function(&$PS,$parameters,...) – SBO Dec 16 '14 at 10:35
  • It's easier to make the suggestion after seeing your code. Basically it's something along these lines: `function MyFunction(PDOStatement &$stmt){ }` – N.B. Dec 16 '14 at 10:45
  • Yeah, i tried. I don't know why but it increases memory a lot, not much as passing the parameter by value but double amount of memory for sure (respect to not using the function but copy/paste code). – SBO Dec 16 '14 at 11:04
  • 1
    Memory increase is always a sign of something not being flushed, whether it's sending it to a DB or output (stdout in CLI's case). Your pseudo-code looks fine, however if the script uses so much memory (over a 100 for simple database insert) is a clear indication you've forgotten or haven't done something right. I've got CLI scripts doing the same job, they never reach 100MB. The logic is similar, `fopen` a file, read N lines, use prepared statement + transaction, insert N lines, commit, start over. – N.B. Dec 16 '14 at 11:10
  • I posted edit code, can ypou please tell me if i'm doing wrong? thanks! – SBO Dec 16 '14 at 11:52