0
<?php

include('simple_html_dom.php');
function curPageURL() {
    $pageURL = 'http';
    $pageURL .= "://";
    if ($_SERVER["SERVER_PORT"] != "80") {
        $pageURL .=    $_SERVER["SERVER_NAME"].":".$_SERVER["SERVER_PORT"].$_SERVER["REQUEST_URI"];
    }else {
         $pageURL .= $_SERVER["SERVER_NAME"].$_SERVER["REQUEST_URI"];
    }
    return $pageURL;
}

// Retrieve the DOM from a given URL
$html = file_get_html(curPageURL());
str_ireplace("http://martianguy.com","http://new.martianguy.com", $html);

?>

I am trying to replace all links with domain martianguy.com with new.martianguy.com (all href and scr attributes). Is it ok to use the current page url in file_get_html function? When I test this on my localhost, it does not do anything and times out after 30 seconds.

dethtron5000
  • 10,363
  • 1
  • 31
  • 32

3 Answers3

2

file_get_html() returns a DOM object (http://simplehtmldom.sourceforge.net/manual_api.htm) while str_ireplace is expecting a string (http://www.php.net/manual/en/function.str-ireplace.php).

You have to loop through your DOM object and do the replace for each node. You can also just use file_get_contents (http://php.net/manual/en/function.file-get-contents.php) and replace every occurence of the url, but in this case it won't be only the src and href.

roptch
  • 225
  • 1
  • 6
  • Infact, it would be the best if every occurrence is replaced. How do I loop through the DOM? (I am new to this world) :-) – Martian Guy Jul 02 '13 at 13:02
  • If you really want ALL occurences of the url in the entire html page (not just src and href attributes), then just replace file_get_html by file_get_contents. – roptch Jul 02 '13 at 13:03
1

Seems to me this script would be recursive. If curPageUrl() returns the URL of the current page/script, and the script that calls curPageUrl() is on the same page, wouldn't the script be calling itself over http? If that's the case, it would explain the timeout after 30 seconds. The script calls itself over http recursively until you hit the php max_execution_time for the first call, which defaults 30 seconds.

Some suggestions:

  1. If the script must be on this page, add a get variable to the URL in curPageUrl() then only run your replacement code if the variable isn't set:

    if($_REQUEST['loaded'] != 1) {
        $html = file_get_contents(curPageURL()."?loaded=1");
        echo str_ireplace("oldURL","newURL", $html);
    }
    
  2. Use javascript, which runs on the page after the html has been loaded, and does the replacement on the client side.

  3. This assumes that the content you're trying to replace is dynamic. If it's static, I'd save it to a file, and then use another script to make the replacements.

Hope that helps!

luke
  • 405
  • 4
  • 12
  • This is totally true, the script is recursive. – roptch Jul 02 '13 at 14:11
  • this sure helps, Thanks! If i use javascript, it still makes the changes before the browser starts requesting for the resources right? – Martian Guy Jul 02 '13 at 14:59
  • If they're links, then they're not requested until someone clicks on them. It's tricker for images and other resources. Some discussion on that here: http://stackoverflow.com/questions/14415027/change-src-of-image-before-request-has-been-sent (Gist: It might work on older browsers, but isn't reliable on newer ones) – luke Jul 02 '13 at 15:04
  • @lukek isn't that true only if i use javascript on client side? if i use server side script, the changes would have been made before the browser gets the html. – Martian Guy Jul 02 '13 at 16:29
  • @lukek Okey, i am using your suggestion number 1 and its making those changes. But 'echo str_ireplace("oldURL","newURL", $html);' appends to the existing source code of the page instead of replacing it. Could you help me with this too? Thanks for helping me thus far. The ball's rolling! – Martian Guy Jul 02 '13 at 19:04
  • I'd put this code at the top of the page and exit immediately after the echo statement. Or enclose the rest in a else clause... – luke Jul 03 '13 at 00:12
  • @lukek That did it! I can't thank you enough for helping me and following up all the way. :-) – Martian Guy Jul 03 '13 at 02:25
0

The str_ireplace function doesn't change strings in-place. You need to assign the output of that function to a variable.

dethtron5000
  • 10,363
  • 1
  • 31
  • 32