9

There are a lot of posts on converting relative to absolute paths in PHP. I'm looking for a specific implementation beyond these posts (hopefully). Could anyone please help me with this specific implementation?

I have a PHP variable containing diverse HTML, including hrefs and imgs containing relative urls. Mostly (for example) /en/discover or /img/icons/facebook.png

I want to process this PHP variable in such a way that the values of my hrefs and imgs will be converted to http://mydomain.com/en/discover and http://mydomain.com/img/icons/facebook.png

I believe the question below covers the solution for hrefs. How can we expand this to also consider imgs?

Would a regex be in order? Or since we're dealing with a lot of output should we use DOMDocument?

Community
  • 1
  • 1
chocolata
  • 3,258
  • 5
  • 31
  • 60
  • And wouldn't be using `` in your `` enough? If not, then using a regex is all You could do - actualy you would need to use `preg_replace_all` function. – shadyyx Nov 19 '12 at 16:18
  • Thanks for your response. Good suggestion, but I don't think so, since the output will be displayed in an XML-document. Problem is I'm incompetent with regexes... – chocolata Nov 19 '12 at 16:20

3 Answers3

9

After some further research I've stumbled upon this article from Gerd Riesselmann on how to solve the absence of a base href solution for RSS-feeds. His snippet actually solves my question!

http://www.gerd-riesselmann.net/archives/2005/11/rss-doesnt-know-a-base-url

<?php
function relToAbs($text, $base)
{
  if (empty($base))
    return $text;
  // base url needs trailing /
  if (substr($base, -1, 1) != "/")
    $base .= "/";
  // Replace links
  $pattern = "/<a([^>]*) " .
             "href=\"[^http|ftp|https|mailto]([^\"]*)\"/";
  $replace = "<a\${1} href=\"" . $base . "\${2}\"";
  $text = preg_replace($pattern, $replace, $text);
  // Replace images
  $pattern = "/<img([^>]*) " . 
             "src=\"[^http|ftp|https]([^\"]*)\"/";
  $replace = "<img\${1} src=\"" . $base . "\${2}\"";
  $text = preg_replace($pattern, $replace, $text);
  // Done
  return $text;
}
?>

Thank you Gerd! And thank you shadyyx to point me in the direction of base href!

chocolata
  • 3,258
  • 5
  • 31
  • 60
4

Excellent solution. However, there is a small typo in the pattern. As written above, it truncates the first character of the href or src. Here are patterns that work as intended:

// Replace links
$pattern = "/<a([^>]*) " .
         "href=\"([^http|ftp|https|mailto][^\"]*)\"/";

and

// Replace images
$pattern = "/<img([^>]*) " . 
         "src=\"([^http|ftp|https][^\"]*)\"/";

The opening parenthesis of the second replacement references are moved. This brings the first character of the href or src which doesn't match http|ftp|https into the replacement references.

five2one
  • 41
  • 3
  • Thanks, work better! Only links starting with # shouldn't be affected.. Using [^http|ftp|https|mailto|#] works for '#head1', but it should replace 'mypage.html#head1' with the full url.. – Barryvdh Aug 14 '13 at 08:43
3

I found that when the href src and base url started getting more complex, the accepted answer solution didn't work for me.

for example:

base url:

http://www.journalofadvertisingresearch.com/ArticleCenter/default.asp?ID=86411&Type=Article

href src:

/ArticleCenter/LeftMenu.asp?Type=Article&FN=&ID=86411&Vol=&No=&Year=&Any=

incorrectly returned:

/ArticleCenter/LeftMenu.asp?Type=Article&FN=&ID=86411&Vol=&No=&Year=&Any=

I found the below function which correctly returns the url. I got this from a comment here: http://php.net/manual/en/function.realpath.php from Isaac Z. Schlueter.

This correctly returned:

http://www.journalofadvertisingresearch.com/ArticleCenter/LeftMenu.asp?Type=Article&FN=&ID=86411&Vol=&No=&Year=&Any=
function resolve_href ($base, $href) { 

// href="" ==> current url. 
if (!$href) { 
    return $base; 
} 

// href="http://..." ==> href isn't relative 
$rel_parsed = parse_url($href); 
if (array_key_exists('scheme', $rel_parsed)) { 
    return $href; 
} 

// add an extra character so that, if it ends in a /, we don't lose the last piece. 
$base_parsed = parse_url("$base "); 
// if it's just server.com and no path, then put a / there. 
if (!array_key_exists('path', $base_parsed)) { 
    $base_parsed = parse_url("$base/ "); 
} 

// href="/ ==> throw away current path. 
if ($href{0} === "/") { 
    $path = $href; 
} else { 
    $path = dirname($base_parsed['path']) . "/$href"; 
} 

// bla/./bloo ==> bla/bloo 
$path = preg_replace('~/\./~', '/', $path); 

// resolve /../ 
// loop through all the parts, popping whenever there's a .., pushing otherwise. 
    $parts = array(); 
    foreach ( 
        explode('/', preg_replace('~/+~', '/', $path)) as $part 
    ) if ($part === "..") { 
        array_pop($parts); 
    } elseif ($part!="") { 
        $parts[] = $part; 
    } 

return ( 
    (array_key_exists('scheme', $base_parsed)) ? 
        $base_parsed['scheme'] . '://' . $base_parsed['host'] : "" 
) . "/" . implode("/", $parts); 
} 
joshweir
  • 5,427
  • 3
  • 39
  • 59