0

I wrote a PHP script to pull tweets from the Twitter firehose and store them into a database. Ideally I want to just let it run so that it collects tweets over time, thus, it's wrapped in a while(1) loop.

This seems to be problematic because it's timing out. If I just run it in a browser, it won't run for more than 30 seconds before timing out and giving me a 324 Error.

Question: Is there a way that I can have it run for a certain amount of time (20 seconds), auto kill itself, then restart? All in a cron job (PS...I don't know how to write a cron job)?

Background: Site hosted on Godaddy. Would ideally like to run this on my hosting server there.

The Script:

<?php
    $start = time();
    $expAddress = "HOSTNAME";
    $expUser = "USERNAME";
    $expPwd = "PASSWORD";
    $database = "DBNAME";

    $opts = array(
        'http' => array(
            'method'    =>  "POST",
            'content'   =>  'keywords,go,here',
        )
    );

    // Open connection to stream
    $db = mysql_connect($expAddress, $expUser, $expPwd);
    mysql_select_db($database, $db);

    $context = stream_context_create($opts);
    while (1) {
        $instream = fopen('https://USERNAME:PASSWORD@stream.twitter.com/1/statuses/filter.json','r' ,false, $context);
        while(! feof($instream)) {

             if(time() - $start > 5) { // break after 5 seconds
                break;
             }


            if(! ($line = stream_get_line($instream, 100000, "\n"))) {
                continue;
            }
            else {
                $tweet = json_decode($line);

                // Clean before storing             

                            // LOTS OF VARIABLES FOR BELOW...REMOVED FOR READABILITY

                // Send to database
                $ok = mysql_query("INSERT INTO tweets 
                    (created_at, from_user, from_user_id, latitude, longitude, tweet_id, language_code, 
                            place_name, profile_img_url, source, text, retweet_count, followers_count,
                            friends_count, listed_count, favorites_count) 
                    VALUES 
                    (NOW(), '$from_user', '$from_user_id', '$latitude', '$longitude', '$tweet_id', '$language_code', 
                            '$place_name', '$profile_img_url', '$source', '$text', '$retweet_count', '$followers_count',
                            '$friends_count', '$listed_count', '$favorites_count')");

                if (!$ok) { echo "Mysql Error: ".mysql_error(); }

                flush();
            }
        }
    }
?>
Jon
  • 3,154
  • 13
  • 53
  • 96
  • Run this as a daemon via terminal. `php script.php` – datasage Mar 05 '13 at 21:46
  • How do I set it up as a daemon, and how can I run it in a terminal? Can I do that on my Godaddy hosting server? – Jon Mar 05 '13 at 21:47
  • You will need to execute your script from the terminal (shell) using a cron job, so you will need to learn about how to make them depending on your system – Abu Romaïssae Mar 05 '13 at 21:47
  • @Jon, if you have cPanel, you can add cronjobs from there, I'm not sure about Godaddy, but it should be there – Abu Romaïssae Mar 05 '13 at 21:48
  • @Jon if you can login via ssh yes, but its quite possible they might autokill a anything that runs too long. You will need to probably look at using an environment which you have more control. – datasage Mar 05 '13 at 21:49
  • Set the php-timeout *inside* the loop, setting the timeout will reset the timeout-counter, so, for example `set_time_limit(10);` inside your loop will allow each iteration to take up to 10 seconds – thaJeztah Mar 05 '13 at 21:49
  • @thaJeztah Or just set it to 0 at the start of the script. – datasage Mar 05 '13 at 21:51
  • The advantage of setting it inside the loop is that you'll have *some* control over timeouts. E.g. If the script is unable to connect to twitter it will not 'hang' indefinitely, but timeout after 10 seconds. – thaJeztah Mar 05 '13 at 21:53
  • And then you can sleep(20) seconds...and loop – Steward Godwin Jornsen Mar 05 '13 at 21:58

3 Answers3

3

You can have cron jobs run once a minute.

To do this follow these steps:

  1. Make a script that runs your PHP code, for example:

    #!/bin/bash
    wget myurl.com/blah > /dev/null
    

    Save it as my-cron.sh in some folder (like /var)

  2. Add it to cron. Run crontab -e See Cron Format and Crontab usage. This for example, will run it once a minute.

    # Minute   Hour   Day of Month   Month   Day of Week    Command    
        *        *          *          *          *         /var/my-cron.sh
    
redolent
  • 4,159
  • 5
  • 37
  • 47
  • 1
    Although I prefer this approach as well, be sure to *not* use an infinite loop in this case, otherwise the script will keep running an a new instance of the script is started each minute! Also, it's probably better to convert the php script to be runnable from the command-line in stead of having a publicly accessible location that can be triggered by anyone. (Risk of DOS attack) – thaJeztah Mar 05 '13 at 21:57
  • This looks pretty easy to follow. Should I just set_timeout(60) or something so that it stops itself before restarting? Or just something like @thaJeztah suggested in a comment on the original post? – Jon Mar 06 '13 at 15:31
  • It depends on the amount of data you want to collect; you may decide to, for example, run the loop 10 times every time the script is called. Or maybe run it just *once* every minute. I don't know if it's required to call the Twitter-API several times in a short time to get meaningfull results (I suspect it already gives you a *list* of 'recent' tweets?) – thaJeztah Mar 06 '13 at 15:46
  • Right, that makes sense. Do I need to remove the while(1) or put the set_time_limit in the while(! feof($instream)) loop? Ideally I'd get as much data as possible. GoDaddy doesn't let you run a cron job any more frequently than every 5 minutes, so the job would have to run for 5 minutes, stop, then restart again at the 'new' 5 minute mark. How would I alter the code to make sure that it doesn't die during one run, and if it does, restart itself? Is this possible? – Jon Mar 06 '13 at 15:49
  • You _must_ remove the `while(1)`. You could simply set a count like `while( $i++ < 5 )` If you need to run the script longer, you could try using the command-line with `php -f ` – redolent Mar 06 '13 at 18:31
2

If I get well your need, the best thing for you is to use cron job making a script run indefinitely will not be a good idea.

As specifier in one of you comments you are using a hosting server Godaddy so probably you will not be able to have shell access, BUT depending on your cPanel version you may be able to create and define cron job.

see this link and this google search

Perhaps, if you don't have this option and you are wiling to let a browser opened I would suggest the following

create an html page as a client which would make an ajax request every hours to your PHP script, like this you emulate a cron job function

the ajax request code might look like (using jQuery)

function makeRequest(){
    $.ajax({
        url: "http://yourhost/url-to-your-script.php",
        complete: function(data){
            setTimeout(function(){
                makeRequest();
            }, 60 * 60 * 1000); // Minutes * Seconds * MS
        }
    });
}
makeRequest();

I hope this helps

EDIT

this link might help too

IMPORTANT DO NOT FORGET TO REMOVE THE INFINITE LOOP

Community
  • 1
  • 1
Abu Romaïssae
  • 3,841
  • 5
  • 37
  • 59
  • Sorry, I copied this code from a script I had, and I forget this call, now I've edited the post and rectified it a bit – Abu Romaïssae Mar 06 '13 at 16:36
  • No problem. Which loop am I removing? The while(1) right? Not the while(!feof)? – Jon Mar 06 '13 at 16:36
  • yes the `while(1)` and if you have trouble creating the client html page, I can help you with this too – Abu Romaïssae Mar 06 '13 at 16:37
  • If I set_time_limit(10) in store_tweets.php and then have my ajax timeout to 10 seconds, theoretically it'll "run indefinitely" right? So it'll get tweets for 10 seconds before dying, then another ajax call will restart it? – Jon Mar 06 '13 at 16:40
  • Actually I don't see why you will need to use the `set_time_limit` also no need to specify any ajax timeout, the script will stop from itself after its execution and then will be restarted by the next Ajax request (after 1 hour), except that you may need to do a small changes in the ajax request in case the request fails that it should start again, see my edited code – Abu Romaïssae Mar 06 '13 at 19:24
  • The script's execution never ends since it's the Twitter stream. There will never be an eof (end of file)...it just keeps growing as people continue to tweet. That's why I'm confused...I don't know how to stop it - then restart it. – Jon Mar 06 '13 at 19:32
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/25717/discussion-between-abu-romaissae-and-jon) – Abu Romaïssae Mar 06 '13 at 19:35
  • I saw a break in the begening of the loop, so the script will run and stops after about 5 seconds – Abu Romaïssae Mar 06 '13 at 19:39
0
I just had same issue.

Only cron job can do if you want run script off browser. You can set up cron job with free providers or you can set up cron job in windows's Scheduled tasks.

If your site has a good traffic then you can follow the option below that your users does the work for you.

In php you can find time in hour and seconds 
$time= date(' H:i:s');
create a table to track if the code was run.
eg; table column  name check with option 0 and 1;

select check from table.

    enter code here

if ($minute > 59)
{
if($check==0)
{
run your code
then update the table each time when it was run
eg; update table set check='1' 
}
}

then another if condition to reset your code
if(minute>0 && minute <1)
{
select check from your table.
if(check==1)
{
update table set check='0'
}
}
james
  • 1