17

I have a list of absolute URLs. I need to make sure that they all have trailing slashes, as applicable. So:

I'm guessing I need to use regex, but matching URLs are a pain. Was hoping for an easier solution. Ideas?

StackOverflowNewbie
  • 39,403
  • 111
  • 277
  • 441

5 Answers5

23

For this very specific problem, not using a regex at all might be an option as well. If your list is long (several thousand URLs) and time is of any concern, you could choose to hand-code this very simple manipulation.

This will do the same:

$str .= (substr($str, -1) == '/' ? '' : '/');

It is of course not nearly as elegant or flexible as a regular expression, but it avoids the overhead of parsing the regular expression string and it will run as fast as PHP is able to do it.
It is arguably less readable than the regex, though this depends on how comfortable the reader is with regex syntax (some people might acually find it more readable).

It will certainly not check that the string is really a well-formed URL (such as e.g. zerkms' regex), but you already know that your strings are URLs anyway, so that is a bit redundant.

Though, if your list is something like 10 or 20 URLs, forget this post. Use a regex, the difference will be zero.

Damon
  • 67,688
  • 20
  • 135
  • 185
  • 33
    $str = rtrim($string, '/') . '/'; –  Jun 20 '13 at 02:01
  • +1 for not regexing here. "It is of course not nearly as elegant or flexible as a regular expression." -- No, no, quite the contrary! Knee-jerk `preg_...` calls to dead simple tasks like this is the precise opposite of elegant. BTW, talking of elegant: @Vino's "smartlet" is probably the coolest stuff on this page, well done! :) – Sz. Feb 07 '17 at 03:21
  • Note that this wouldn't normalize a sloppy URL with multiple trailing slashes. (I saw OP mentioning normalization as a requirement in a comment under the accepted answer; and it's a nice take regardless.) @Vino's would. (That should be a separate answer, and the accepted one.) – Sz. Feb 07 '17 at 03:46
15

Rather than doing this using regex, you could use parse_url() to do this. For example:

$url = parse_url("http://www.example.com/ab/abc.html?a=b#xyz");
if(!isset($url['path'])) $url['path'] = '/';
$surl = $url['scheme']."://".$url['host'].$url['path'].'?'.$url['query'].'#'.$url['fragment'];
echo $surl;
avelis
  • 1,143
  • 1
  • 9
  • 18
asleepysamurai
  • 1,362
  • 2
  • 14
  • 23
  • @Pekka What's so wrong in regexp? You cannot be sure if parse_url is using it inside or not. Especially when it's parsing a lot more things than just single slash. – Your Common Sense Mar 27 '11 at 09:45
  • @Col - Well, in this case, the OP specifically asked for a non-regex solution. On the more general case the best answer is always *"it depends"*, though URLs tend to be more complex than `(\w+\.)\w+`... Actually, I don't quick understand the base for this question, why adding the slash anyway? – Kobi Mar 27 '11 at 09:54
  • 2
    @Col I tend to always choose the standard URL parsing functions over regexes, because 1.) as standard functions, they are supposed to deal with every imaginable edge case and 2.) Regexes reduce maintainability if you, or a colleague, are not very good at them (like me). Nothing wrong with regexes in general, though. Do you see a scenario where this solution doesn't work? If yes, can you show it? I don't see it – Pekka Mar 27 '11 at 09:56
  • @Kobi: I need to "normalize" my URLs. Remember, "http://www.domain.com" !== "http://www.domain.com/" when doing string comparison. However, I need a unique list of URLs where "http://www.domain.com" is considered equal to "http://www.domain.com/". I guess the best way to achieve that is to make sure they all have the trailing slash. – StackOverflowNewbie Mar 27 '11 at 10:18
  • @Kobi so what? When I have to choose between "the OP asked" and common sense I'm always choose the latter. – Your Common Sense Mar 27 '11 at 13:19
  • This is not good solution... ` php > $url = parse_url("http://www.example.com"); php > if(!isset($url['path'])) $url['path'] = '/'; php > $surl = $url['scheme']."://".$url['host'].$url['path'].'?'.$url['query'].'#'.$url['fragment']; php > echo $surl; http://www.example.com/?#` – Yuda Prawira Aug 03 '11 at 19:34
  • @Gunslinger_: Just add isset checks for elements query and fragment of $url array. Ideally we would be using isset to check for each element in the $url array we are using. I didn't add it in the example because I thought it would be obvious and would detract from the basic premise of the example. – asleepysamurai Aug 13 '11 at 15:05
5
$url = 'http://www.domain.com';

$need_to_add_trailing_slash = preg_match('~^https?://[^/]+$~', $url);
zerkms
  • 249,484
  • 69
  • 436
  • 539
1

This may not be the most elegant solution, but it works like a charm. First we get the full url, then check to see if it has a a trailing slash. If not, check to see that there is no query string, it isn't an actual file, and isn't an actual directory. If the url meets all these conditions we do a 301 redirect with the trailing slash added.

If you're unfamiliar with PHP headers... note that there cannot be any output - not even whitespace - before this code.

$url = $_SERVER['REQUEST_URI'];
$lastchar = substr( $url, -1 );
if ( $lastchar != '/' ):
    if ( !$_SERVER['QUERY_STRING'] and !is_file( $_SERVER['DOCUMENT_ROOT'].$url ) and     !is_dir( $_SERVER['DOCUMENT_ROOT'].$url ) ):
        header("HTTP/1.1 301 Moved Permanently");
        header( "Location: $url/" );
    endif;
endif;
  • Yeesh. I don't know that full redirecting the user is quite necessary. This has the potential to cause an endless loop -- what if the client is, for some reason, taking off the trailing slash? The practice of using `substr` is fine, but you should just put the trailing slash on whatever variables in your script that need it and not redirect the user. – Chris Baker Feb 28 '14 at 15:13
1

Try this:

if (!preg_match("/.*\/$/", $url)) {

     $url = "$url" . "/";
}
Matteo Riva
  • 24,728
  • 12
  • 72
  • 104
user326583
  • 21
  • 1