0

I'm currently working on a project which needs to geocode locations stored in a mySQL database.

I wrote a small script which imports a CSV into the mySQL database and automatically fills the longitude and latitude fields in each row with a "0.000000".

Now I found a script that checks the database for missing geocodes and updates those. Since this script is a few years old and just worked with Google I fitted it to my needs (should work with Wordpress and Nominatim instead of Google Maps).

Importing the CSV works just fine but when running this script to update about 5200 locations I get a timeout. Unfortunately I can't increase any timeout settings so I need to split task into steps.

How can I accomplish this?

I'm still a bit green behind the ears when it comes to PHP so any help would be appreciated. I already tried splitting the loop by increasing a counter variable which I squared with the number of mySQL rows I need to update. It should refresh the page now every 100 rows but unfortunately it didn't really work. It works for about 500 or so locations but after that - timeout.

Does anybody know a solution?

// Opens a connection to a MySQL server
$connection = mysql_connect(DB_HOST, DB_USER, DB_PASSWORD);
if (!$connection) {
die("Not connected : " . mysql_error());
}

// Set the active MySQL database
$db_selected = mysql_select_db(DB_NAME, $connection);
if (!$db_selected) {
die("Can\'t use db : " . mysql_error());
}

// Select all the rows in the markers table
$query = "SELECT * FROM locations WHERE 1";
$result = mysql_query($query);

if (!$result) {
die("Invalid query: " . mysql_error());
}

// Initialize delay in geocode speed
$delay = 0;
function getLnt($address){
    $request_url = "http://nominatim.openstreetmap.org/search?q=".urlencode($address)."&format=json&addressdetails=1";
    $result_string = file_get_contents($request_url);
    $result_json = json_decode($result_string, true);
    return $result_json[0];
}


// Iterate through the rows, geocoding each address


    while ($row = @mysql_fetch_assoc($result)) {

        $geocode_pending = true;

        while ($geocode_pending) {

            $street = $row["address"];
            $zip = $row["zip"];
            $city = $row["city"];
            $id = $row["id"];

            $address = $street.', '.$zip.' '.$city;

            $details = getLnt($address);
            $lat = $details[lat];
            $lng = $details[lon];
            $state = $details[address][state];
            $country_iso = $details[address][country_code];

            if (http_response_code() == 200) {     

                // successful geocode
                $geocode_pending = false;

                $query = sprintf("
                    UPDATE locations SET lat = '%s', lng = '%s', state = '%s', country_iso = '%s' WHERE lat = '0.000000' AND lng = '0.000000' AND id = '%s' LIMIT 1;",
                    mysql_real_escape_string($lat), mysql_real_escape_string($lng), mysql_real_escape_string(utf8_decode($state)),mysql_real_escape_string($country_iso),mysql_real_escape_string($id)
                );

                $update_result = mysql_query($query);

                if (!$update_result) {
                   die("Invalid query: " . mysql_error());
                }

        } else if(http_response_code() == 620) {

                // sent geocodes too fast
                $delay += 100000;

            } else if($details == NULL) {

                // failure to geocode
                $geocode_pending = false;
                echo "Address " . $address . " failed to geocoded. ";
                echo "Received status " . $status . "
                \n";

            }
            usleep($delay);

        }

}
Zong
  • 6,160
  • 5
  • 32
  • 46
The_Hegemon
  • 17
  • 1
  • 3
  • The micrseconds seem problematic. `$delay += 100000;` basically means delay by 1/10th of a second each time a “Too many requests…” `620` response is sent. – Giacomo1968 Jun 03 '14 at 18:56
  • Can't be the problem otherwise I would've been blocked for the remaining day. – The_Hegemon Jun 03 '14 at 19:47

1 Answers1

1

You have to add pagination to your location query:

// Select all the rows in the markers table
$offset = intval($_GET['offset'];
$query = "SELECT * FROM locations WHERE 1 LIMIT $offset, 100";

and then iterate through pages in your browser. You can do it manually, or by using javascript/jquery on another page which will query destination:

var locations_count = 1000000;
var offset = 0;
var func = function(){
    $.get('update-location.php', {offset: offset}, function(){
            document.write('<br/>Processed ' + offset + ' records');
            offset += 100;
            if( offset < locations_count )
                setTimeout(func, 100);
            else
                document.write('<br/>Update finished');
    });
};
func();

(dont forget to update locations_count to your actual number of locations).

And other solution:

Dump the table to your local machine and then update it locally and then update the resulting table on server.

Giacomo1968
  • 25,759
  • 11
  • 71
  • 103
Vitaly Dyatlov
  • 1,872
  • 14
  • 24
  • Would it be possible to do this entirely with PHP and fully automatically? – The_Hegemon Jun 03 '14 at 19:17
  • as I said - copy to local machine, do update and then upload results back - it will be just php. – Vitaly Dyatlov Jun 03 '14 at 19:28
  • Copying it to a local machine is not an option - it has to work on the server. – The_Hegemon Jun 03 '14 at 19:29
  • if you have access to server over ssh - then you can run your script using php-cli, it has no time limits – Vitaly Dyatlov Jun 03 '14 at 19:33
  • Also not an option. It has to be absolutely fool-proof (client-proof) because it's not me who is handling the data in the end. – The_Hegemon Jun 03 '14 at 19:38
  • 1
    Man, you see that it simply doesnt work as is in a browser, you should agree some compromises. I offered you a JS front-end which would eliminate big chunks issue – Vitaly Dyatlov Jun 03 '14 at 19:41
  • @The_Hegemon The deal is this solution is valid, and Vitaly did a decent job. But the larger issue here is you say, “I'm still a bit green behind the ears when it comes to PHP…” But then you are scoffing at valid engineering solutions. If you don’t understand PHP or programming you really need to defer to—and perhaps adapt—to the experiences others with deeper skills have & are sharing. – Giacomo1968 Jun 03 '14 at 19:52
  • I didn't say that he didn't do a good job and I know I can only learn from others with deeper skills. It just is beyond my knowledge why this can't be solved without Javascript. I'm also grateful for your help, don't get me wrong - I really am. This actually helped me a lot, Vitaly. Thanks a lot! – The_Hegemon Jun 03 '14 at 20:10
  • 1
    it can't be done within one call because you experience time limits. So you have to split your job into many calls. In browser you can do many calls only by using javascript. – Vitaly Dyatlov Jun 03 '14 at 20:15
  • But what if I would reload (that would be multiple calls or wouldn't it be?) the page after every 100th iteration of the loop by setting and increasing a counter variable and stop the script if the counter variable matches the number of rows to update or the latter would be zero? Wouldn't that work? – The_Hegemon Jun 03 '14 at 20:45