2

i am writting an small crawler that extract some 5 to 10 sites while getting the links i am getting some urls like this

../tets/index.html

if it is /test/index.html we can add with base url http://www.example.com/test/index.html

what can i do for this kind of urls.

raj
  • 63
  • 1
  • 4

3 Answers3

1

Url like these are relative urls . ".." means "parent directory", whereas "." simply means "this directory", as in bash. For instance, if you are looking at this page : http://www.someserver/test/foo/bar.html , and there is an url like this in it : "../baz/foobar.html", it will in fact point to http://www.someserver/test/baz/foobar.html I think. Just test.

greg0ire
  • 22,714
  • 16
  • 72
  • 101
0

Use dirname() to get base directoy, remove the .. using substr() and append it there. Like this:

<?php
$url = "../tets/index.html";
$currentURL = "http://example.com/somedir/anotherdir";
echo dirname($currentURL).substr($url, 2);
?>

This outputs:

http://example.com/somedir/tets/index.html

shamittomar
  • 46,210
  • 12
  • 74
  • 78
0

Take a look into this URL Normalization Wikipedia page.

Alix Axel
  • 151,645
  • 95
  • 393
  • 500