0

I'm trying to upload a users csv file and pass the data into a database but because the size of the csv / rows, it keeps timing out. The data must be checked to see if it's already in the database and update or create.

I have applied a chunk to the CSV for reading the data but didn't know if it's possible to add a chunk to the upload to database section?

Here is my function

public function import(Request $request) {

    if($request->file('imported-file')) {

        $path = $request->file('imported-file')->getRealPath();

        $data = Excel::filter('chunk')->load($path)->chunk(200, function($results) {

            foreach($results as $row) {

                if(!empty($row['postcode'])) {

                    $url = "https://maps.googleapis.com/maps/api/geocode/xml?address=".urlencode($row['postcode'])."&region=uk&key=";
                    $tmp = file_get_contents($url);
                    $xml = simplexml_load_string($tmp);

                    if((string)$xml->status == 'OK' && isset($xml->result[0])) {

                        $lat = 0;
                        $lng = 0;

                        if(isset($xml->result[0]->geometry->location->lat)) {
                            $lat = (string)$xml->result[0]->geometry->location->lat;
                        }
                        if(isset($xml->result[0]->geometry->location->lng)) {
                            $lng = (string)$xml->result[0]->geometry->location->lng;
                        }

                    }

                    Import::updateOrCreate(
                        [
                            'sitecode' => $row['sitecode']
                        ],
                        [
                            'sitecode' => $row['sitecode'],
                            'sitename' => $row['sitename'],
                            'address_1' => $row['address_1'],
                            'address_2' => $row['address_2'],
                            'address_town' => $row['address_town'],
                            'address_postcode' => $row['postcode'],
                            'charity' => $row['charity'],
                            'latitude' => $lat,
                            'longitude' => $lng,
                            'approved' => 1
                        ]
                    );

                } else {

                    // Postcode not valid!!!

                }


            } // endforeach

            Session::flash('sucess', 'Import was sucessful.');
            return redirect()->route('locations');

        });

    } else {

        Session::flash('error', 'Please select a file to upload!');
        return back();

    }

}
CIB
  • 535
  • 1
  • 12
  • 35

1 Answers1

4

Your problem is related to the configuration of your server and you must understand that many things can go wrong when you are doing long-running tasks in real time.

If you are using a Nginx/PHP-FPM setup, you must look at the Nginx, the PHP and the PHP-FPM config file.

PHP configuration

Let's start with PHP. Open the /etc/php/<phpversion>/fpm/php.ini file and search for max_execution_time. You should find something like

max_execution_time = 30

which means that each request can last no longer than 30 seconds. If you need more time, increase this number, e.g.

max_execution_time = 300

for 5 minutes.

Then let's examine the PHP-FPM configuration. Open your pool configuration, such as /etc/php/<phpversion>/fpm/pool.d/www.conf and search for request_terminate_timeout. In my configuration, I have it disabled:

; Default Value: 0
;request_terminate_timeout = 0

Default value is 0 (disabled), but if you have it enabled you should increase the number, e.g.

request_terminate_timeout = 400

for 400 seconds before PHP-FPM kills a child process. If you give a number, use something higher than max_execution_time otherwise your process will be killed by PHP-FPM ignoring the maximum execution time.

Nginx configuration

Finally, look at Nginx configuration in /etc/nginx/sites-available/yoursite.conf. There you should find a section that configures the communication between Nginx and PHP-FPM. There you find the fastcgi_read_timeout, which is the maximum time Nginx will wait for PHP-FPM to return some data:

location ~ \.php$ {
    # ...
    fastcgi_read_timeout 300;
    # ...
}

If after 300 seconds PHP-FPM didn't returned anything, Nginx will kill the connection. In your case, you send data back to the webserver after the long-running process, so it could not take longer than 300 seconds. You should change this number to something compatible to the numbers you put in PHP configurations.

To sum up

If you think your processing could take up to 30 minutes, use numbers like these:

  • in /etc/php/<phpversion>/fpm/php.ini:

    max_execution_time = 1800        ; 1800 secs = 30 minutes
    
  • in /etc/php/<phpversion>/fpm/pool.d/www.conf:

    request_terminate_timeout = 0    ; no timeout, or greater than 1800
    
  • in /etc/nginx/sites-available/yoursite.conf:

    fastcgi_read_timeout = 2000;     # 2000 secs, or at least greater than the previous twos
    

With this combination, max_execution_time will rule among the others and you will know that your process has 30 minutes of running time, because PHP-FPM and Nginx timeouts should happen after the PHP one.

Don't forget the client-side

If you are using an AJAX uploading library, please check its configuration too, because it could impose another timeout to the full AJAX upload request.

For example, dropzonejs uses a 30 seconds timeout by default. Your server could run for ages, but after that short amount of time your javascript library will kill the connection.

Usually you can change that value. With dropzone, you can setup a 2100 seconds of timeout with

var myDropzone = new Dropzone("#uploader", {
    // ...
    timeout: "2100",
    // ...
});

Again, use a higher value than the one on Nginx.

Long running tasks: the right way

However, your approach is quick and dirty and even if I'm sure it's ok for your case, it would be better to follow another path:

  1. don't do the CSV processing just after the upload
  2. instead, upload the file, put it in a queue and notify the client to check back later
  3. do the CSV processing in background with a queue worker

Please check the Laravel documentation on queues (https://laravel.com/docs/5.5/queues).

With this approach your user will have an immediate feedback, and you won't have timeout problems anymore! (Well, in theory: background jobs from queues can be timeouted too, but that's another story.)

I hope this can help :)

Marco Pallante
  • 3,923
  • 1
  • 21
  • 26
  • Hi, thanks for taking the time out to write a lengthy reply. I took your advice an added a job / queue. I upload the CSV then dispatch a job and process the csv and upload to the database from there. I run php artisan queue:work and all is fine! Well on my macbook that is. I've downloaded the repo to my imac exact same set up but when I run the work command I just get this over and over like an infinite loop? [2017-11-29 11:48:21] Processing: App\Jobs\ProcessCSV [2017-11-29 11:48:22] Processing: App\Jobs\ProcessCSV Any ideas? – CIB Nov 29 '17 at 11:53
  • Probably you're facing job expiration & retrials (https://laravel.com/docs/5.5/queues#job-expirations-and-timeouts). There is a `retry_after` option in the queues configuration (`config/queue.php`) that has a default value of 90 seconds. After that time, if the job didn't complete, the worker will start it again. Also make sure you have a worker `--timeout` big enough (or ignore the options, I think the default is *no timeout*). – Marco Pallante Nov 30 '17 at 13:32