0

So, i have a database with big data. The data to use is currently about 2,6 GB.

All the data need to be written to a text file for later use in another scripts.

The data is being limited per file and splitted in multiple parts. 100 results per file (around 37MB each file). Thats about 71 files.

The data is json data that is being serialized and then encrypted with openssl.

The data is correctly being written to the files, untill the max execution time is reached after 240 seconds. That's after about 20 files...

Well, i can just extend that time, but thats not the problem.

The problem is the following:

Writing file 1-6:   +/- 5 seconds
Writing file 7-8:   +/- 7 seconds
Writing file 9-11:  +/- 12 seconds
Writing file 12-14: +/- 17 seconds
Writing file 14-16: +/- 20 seconds
Writing file 16-18: +/- 23 seconds
Writing file 19-20: +/- 27 seconds

Note: time is needed time per file

In other words, with every file im writing, the writing time per file goes significantly up, what causes the script to be slow offcourse.

The structure of the script is a bit like this:

$needed_files = count needed files/parts

for ($part=1; $part<=$needed_files; $part++) { // Loop throught parts
    $query > mysqli select data
    $data > json_encode > serialize > openssl_encrypyt
    file_put_contents($filename.$part, $data, LOCK_EX);
}

WORKING CODE AFTER HELP

$notchDetails = mysqli_query($conn, "SELECT * FROM notches WHERE projectid = ".$projectid."");

$rec_count = 0;
$limit = 100;
$part = 1;

while ($notch = mysqli_fetch_assoc($notchDetails)) {

    $data1[] = $notch;
    $rec_count++;

    if ($rec_count >= $limit) {

        $data = json_encode($data1);
        $data = openssl_encrypt(bin2hex($data), "aes128", $pass, false, $iv);
        $filename = $mainfolder."/".$projectfolder."/".$subfolder."/".$fname.".part".$part."".$fext;
        file_put_contents($filename, $data, LOCK_EX);

        $part++;
        $rec_count = 0;
        $data = $data1 = "";

    }

}
if ($data1 != "") {
    $data = json_encode($data1);
    $data = openssl_encrypt(bin2hex($data), "aes128", $pass, false, $iv);
    $filename = $mainfolder."/".$projectfolder."/".$subfolder."/".$fname.".part".$part."".$fext;
    file_put_contents($filename, $data, LOCK_EX);
}

mysqli_free_result($notchDetails);
Ramon Bakker
  • 1,075
  • 11
  • 24
  • 1
    I'd start with http://php.net/manual/en/mysqli-result.free.php – Here2Help Feb 24 '16 at 16:13
  • 1
    Isn't json_encode a way of serializing? What is that extra "serializing" step? (Don't think it is the bottleneck, just wondering) – Bart Friederichs Feb 24 '16 at 16:14
  • Because multiple json strings and non-json strings needed to be combined. But im actually not sure if its really a problem if i don't serialize it. Could try it. Thanks for suggestion – Ramon Bakker Feb 24 '16 at 16:16
  • To allow a script to run for longer add `ini_set('max_execution_time', 900);` to the top of your script. Or you can do `ini_set('max_execution_time', 0);` so it can run for ever – RiggsFolly Feb 24 '16 at 16:17
  • i know i could do that, and probably must. But thats not the solution – Ramon Bakker Feb 24 '16 at 16:18
  • @SmartyCoder Im going to try and see the difference – Ramon Bakker Feb 24 '16 at 16:18
  • @SmartyCoder I dont think time spent in database access counts towards the `max_execution_time` – RiggsFolly Feb 24 '16 at 16:19
  • @RiggsFolly it's not about that. it's about more and more memory being used, slowing down the script, which is what he asked about. – Here2Help Feb 24 '16 at 16:20
  • 1
    If each of those files is an approximately similar size, I'd suspect the problem lies with how you're retrieving the data in the first place rather than in how it's written to disk. A poorly optimised, non-indexed query (requiring a full table read) will take longer to run on records 190,000 - 200,000 than it would on records 0 - 10,000 despite the fact that, in each case, 10,000 records are returned. – CD001 Feb 24 '16 at 16:21
  • Maybe if you were not so **cryptic** about your real code, people could make sensible suggestion **rather than all these guesses** – RiggsFolly Feb 24 '16 at 16:24
  • The mysqli_free_result did'nt change anything. But i have the table indexed and it are just more than 7000 records (yes, there big) – Ramon Bakker Feb 24 '16 at 16:25
  • All records are exactly the same for testing purposes. – Ramon Bakker Feb 24 '16 at 16:26
  • at every `SELECT`, the data is limited with `LIMIT`. The query is `SELECT * FROM table WHERE projectid = ".$projectid." LIMIT ".$offset.", ".$limit."`. Every loop the offset and limit are calculated. `projectid` is an index – Ramon Bakker Feb 24 '16 at 16:28
  • Just because a table is indexed it doesn't necessarily mean that MySQL can *use* the index; it depends on the query. MySQL can generally only use one index at a time, for example, so if you've got a condition on a `WHERE` and an `ORDER BY` on different indexes it'll do a full table read. I'd suggest executing your query directly on the database prepended with `EXPLAIN` : http://dev.mysql.com/doc/refman/5.7/en/explain.html – CD001 Feb 24 '16 at 16:28

1 Answers1

1

Personally I would have coded this as a single SELECT with no LIMIT and then based on a $rec_per_file = ?; write the outputs from within the single while get results loop

Excuse the cryptic code, you didnt give us much of a clue

<?php
//ini_set('max_execution_time', 600);    // only use if you have to

$filename = 'something';
$filename_suffix = 1;

$rec_per_file = 100;

$sql = "SELECT ....";

Run query

$rec_count = 0;

while ( $row = fetch a row ) {

    $data[] = serialize > openssl_encrypyt

    $rec_count++;

    if ( $rec_count >= $rec_per_file ) {

        $json_string = json_encode($data);

        file_put_contents($filename.$filename_suffix, 
                          $json_string, 
                          LOCK_EX);

        $filename_suffix++; // inc the suffix
        $rec_count = 0;     // reset counter
        $data = array();    // clear data

        // add 30 seconds to the remaining max_execution_time
        // or at least a number >= to the time you expect this
        // while loop to get back to this if statement
        set_time_limit(30);
    }
}
// catch the last few rows
$json_string = json_encode($data);
file_put_contents($filename.$filename_suffix, $data, LOCK_EX);

Also I am not sure why you would want to serialize() and json_encode()

I had a thought, based on your comment about execution time. If you place a set_time_limit(seconds) inside the if inside the while loop it might be cleaner, and you would not have to set ini_set('max_execution_time', 600); to a very large number, which if you have a real error in here may cause PHP continue processing for a long time before kicking the script out.

From the manual:

Set the number of seconds a script is allowed to run. If this is reached, the script returns a fatal error. The default limit is 30 seconds or, if it exists, the max_execution_time value defined in the php.ini.

When called, set_time_limit() restarts the timeout counter from zero. In other words, if the timeout is the default 30 seconds, and 25 seconds into script execution a call such as set_time_limit(20) is made, the script will run for a total of 45 seconds before timing out.

Community
  • 1
  • 1
RiggsFolly
  • 93,638
  • 21
  • 103
  • 149