1

I have a php script which writes csv files to disk, this is the function:

function fputcsv_content($fp, $array, $delimiter=",", $eol="\n") {

    $line = "";
    foreach($array as $value) {
        $value = trim($value);
        $value = str_replace("\r\n", "\n", $value);
        if(preg_match("/[$delimiter\"\n\r]/", $value)) {
            $value = '"'.str_replace('"', '""', $value).'"';
        }
        $line .= $value.$delimiter;
    }
    $eol = str_replace("\\r", "\r", $eol);
    $eol = str_replace("\\n", "\n", $eol);
    $line = substr($line, 0, (strlen($delimiter) * -1));
    $line .= $eol;
    return fputs($fp, $line);
}

The server is an AWS instance, CentOS 7 and PHP version is 7.2

Server specs: 4GB RAM 32GB SWAP 2 cores, 2.5GHZ

When files are large, (3GB, 4GB) the writing process is very slow, (1MB every 2 or 3 seconds).

Is there any setting in php.ini or apache config that controls this fputs/fwrite function?

I've seen an output_buffer setting in php.ini (currently set to 4096) but I doubt it has anything to do.

Thanks!

Matias
  • 539
  • 5
  • 28
  • 1
    Likely not related to the issue, but you should be using https://www.php.net/manual/en/function.fputcsv.php in the first place here, instead of writing your own escaping logic ... – CBroe Sep 10 '21 at 12:13
  • (Or it might even be related, if the regex & replacement stuff you are doing there is what is actually consuming most of the time. Have you somehow verified that it is actually the write speed itself, that makes this slow?) – CBroe Sep 10 '21 at 12:14
  • Yes, cuz I can monitor the file writing process and it goes slow, I assume once the file appears in the folder where I drop this, it's already at the fputs line and went already through all preceding logic in the function – Matias Sep 10 '21 at 12:17

1 Answers1

2

Don't use .= to append a line. Use an array, add the values to the array, then implode the array. You're now filling your memory with constantly discarded strings. Every time you do .= The old string is kept on the stack, and new space is reserved for the new string, and the GC only runs when the function is ready. With a file of 3-4gb that might end up being many multiples of that, which causes the process to use swap as extra memory, which is slow.

Try refactoring it to an array method and see if that alleviate your issues a bit, by using some memory saving techniques.

I added in the use of static function variables so they get assigned only once, instead of each iteration, which also saves a marginal bit of memory, setting aside whichever optimisations php may or may not do.

See it online: https://ideone.com/dNkxIE

function fputcsv_content($fp, $array, $delimiter=",", $eol="\n") 
{
    static $find = ["\\r","\\n"];
    static $replace = ["\r","\n"];
    static $cycles_count = 0;
    $cycles_count++;
    
    $array = array_map(function($value) use($delimiter) {
      return clean_value($value, $delimiter);
    }, $array);
    $eol = str_replace($find, $replace, $eol);

    $line = implode($delimiter, $array) . $eol;
    
    $return_value = fputs($fp, $line);

    /** purposefully free up the ram **/
    $line = null;
    $eol = null;
    $array = null;

    /** trigger gc_collect_cycles() every 250th call of this method **/
    if($cycles_count % 250 === 0) gc_collect_cycles();

    return $return_value;
}

/** Use a second function so the GC can be triggered here
  * when it returns the value and all intermediate values are free.
  */
function clean_value($value, $delimeter) 
{
   /**
     *  use static values to prevent reassigning the same
     *  values to the stack over and over
     */
   static $regex = []; 
   static $find = "\r\n";
   static $replace = "\n";
   static $quote = '"';
   if(!isset($regex[$delimeter])) {
      $regex[$delimeter] = "/[$delimiter\"\n\r]/";
   }
   $value = trim($value);
   $value = str_replace($find, $replace, $value);
   if(preg_match($regex[$delimeter], $value)) {
        $value = $quote.str_replace($quote, '""', $value).$quote;
   }
   return $value;
}
Tschallacka
  • 27,901
  • 14
  • 88
  • 133