26

How can I test if a string is URL encoded?

Which of the following approaches is better?

  • Search the string for characters which would be encoded, which aren't, and if any exist then its not encoded, or
  • Use something like this which I've made:

function is_urlEncoded($string){
 $test_string = $string;
 while(urldecode($test_string) != $test_string){
  $test_string = urldecode($test_string);
 }
 return (urlencode($test_string) == $string)?True:False; 
}

$t = "Hello World > how are you?";
if(is_urlEncoded($sreq)){
 print "Was Encoded.\n";
}else{
 print "Not Encoded.\n";
 print "Should be ".urlencode($sreq)."\n";
}

The above code works, but not in instances where the string has been doubly encoded, as in these examples:

  • $t = "Hello%2BWorld%2B%253E%2Bhow%2Bare%2Byou%253F";
  • $t = "Hello+World%2B%253E%2Bhow%2Bare%2Byou%253F";
Michael Currie
  • 13,721
  • 9
  • 42
  • 58
Psytronic
  • 6,043
  • 5
  • 37
  • 56
  • 1
    How would the string come to be URL-encoded by the time your PHP script sees it? Is the problem really that your script needs to URL-decode an incoming string, or is the problem that your script needs to not double-encode a link href or input value, for instance? –  Nov 11 '11 at 23:13
  • How about using urldecode and comparing it with the original string. If they match it's not encoded yet. – thedjaney Sep 17 '15 at 05:42

13 Answers13

39

i have one trick :

you can do this to prevent doubly encode. Every time first decode then again encode;

$string = urldecode($string);

Then do again

$string = urlencode($string);

Performing this way we can avoid double encode :)

Irfan
  • 407
  • 1
  • 4
  • 2
  • 4
    That is wrong! URL which is once decoded, can not be encoded in the same way. For more info check: http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding As example "a+b" as path parameter is valid. Then if you decode it you have same string (a+b), and then after encode result is "a%2Bb"! – instead Jan 05 '16 at 14:04
  • 1
    This will cause in trouble. E.g. if you have a plain text string with a plus sign like this: "TestString Super Mega +" The plus sign will be removed, if you pipe it trough urldecode(); – suther Apr 18 '17 at 08:37
  • 1
    Link from blog.lunatech is down. Here is alternative URL: https://web.archive.org/web/20151229061347/http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding – instead Jun 13 '18 at 00:10
  • @instead I think this is depend on what function to use. If you want to strict to RFC 3986 or `rawurlxxxxxx`, then + will be `%20b` but if you use `urlxxxxxx` then it still be + sign but all space characters will be + sign too. :) – vee Nov 15 '20 at 10:35
22

Here is something i just put together.

if ( urlencode(urldecode($data)) === $data){
    echo 'string urlencoded';
} else {
    echo 'string is NOT urlencoded';
}
  • 2
    @suther please test it with various inputs, i dont recall but sometimes it does not work as expected. –  May 26 '17 at 16:41
11

You'll never know for sure if a string is URL-encoded or if it was supposed to have the sequence %2B in it. Instead, it probably depends on where the string came from, i.e. if it was hand-crafted or from some application.

Is it better to search the string for characters which would be encoded, which aren't, and if any exist then its not encoded.

I think this is a better approach, since it would take care of things that have been done programmatically (assuming the application would not have left a non-encoded character behind).

One thing that will be confusing here... Technically, the % "should be" encoded if it will be present in the final value, since it is a special character. You might have to combine your approaches to look for should-be-encoded characters as well as validating that the string decodes successfully if none are found.

Rachid O
  • 13,013
  • 15
  • 66
  • 92
jheddings
  • 26,717
  • 8
  • 52
  • 65
  • "supposed to have the sequence `%2B` in it", his decode-check-encode-check is an attempt to counter this (decode to space, encode to %2B, not encoded) – falstro Oct 28 '09 at 15:01
  • True, unless the intent was to pass that sequence as the final value... Your arithmetic example is a better example where that would fail. Instead, by checking for characters that "should have" been encoded, the application gets a little better clue whether the string is already encoded. – jheddings Oct 28 '09 at 15:08
  • Specifically the : character, which is a required delimiter in valid uris (https://tools.ietf.org/html/rfc3986) will not be present in a urlencoded string. – Luke Mlsna May 06 '19 at 15:40
6

well, the term "url encoded" is a bit vague, perhaps simple regex check will do the trick

$is_encoded = preg_match('~%[0-9A-F]{2}~i', $string);
user187291
  • 53,363
  • 19
  • 95
  • 127
6

What about:

if (urldecode(trim($url)) == trim($url)) { $url_form = 'decoded'; }
  else { $url_form = 'encoded'; }

Will not work with double encoding but this is out of scope anyway I suppose?

sth
  • 222,467
  • 53
  • 283
  • 367
Sebastian
  • 61
  • 1
  • 1
5

I think there's no foolproof way to do it. For example, consider the following:

$t = "A+B";

Is that an URL encoded "A B" or does it need to be encoded to "A%2BB"?

Kaivosukeltaja
  • 15,541
  • 4
  • 40
  • 70
3

There's no reliable way to do this, as there are strings which stay the same through the encoding process, i.e. is "abc" encoded or not? There's no clear answer. Also, as you've encountered, some characters have multiple encodings... But...

Your decode-check-encode-check scheme fails due to the fact that some characters may be encoded in more than one way. However, a slight modification to your function should be fairly reliable, just check if the decode modifies the string, if it does, it was encoded.

It won't be fool proof of course, as "10+20=30" will return true (+ gets converted to space), but we're actually just doing arithmetic. I suppose this is what you're scheme is attempting to counter, I'm sorry to say that I don't think there's a perfect solution.

HTH.

Edit:
As I entioned in my own comment (just reiterating here for clarity), a good compromise would probably be to check for invalid characters in your url (e.g. space), and if there are some it's not encoded. If there are none, try to decode and see if the string changes. This still won't handle the arithmetic above (which is impossible), but it'll hopefully be sufficient.

falstro
  • 34,597
  • 9
  • 72
  • 86
  • "However, a slight modification to your function should be fairly reliable, just check if the decode modifies the string, if it does, it was encoded." I thought this, however if this is the string "Hello+World how are you" then decoding it will produce a change, but it would not have been fully encoded. – Psytronic Oct 28 '09 at 15:04
  • @Psytronic: Very true, that + is a bugger isn't it. If you can find a way to determine if it's a valid URL, and then decoding to check for a change would probably be a better solution. You should be able to devise a regular expression to look for 'bad'-characters like space (if it's not valid, it's not encoded). – falstro Oct 28 '09 at 15:15
3

@user187291 code works and only fails when + is not encoded.

I know this is very old post. But this worked to me.

$is_encoded = preg_match('~%[0-9A-F]{2}~i', $string);
if($is_encoded) {
 $string  = urlencode(urldecode(str_replace(['+','='], ['%2B','%3D'], $string)));
} else {
  $string = urlencode($string);
}
B L Praveen
  • 1,812
  • 4
  • 35
  • 60
1

send a variable that flags the decode when you already getting data from an url.

?path=folder/new%20file.txt&decode=1
Echilon
  • 10,064
  • 33
  • 131
  • 217
phpBananas
  • 11
  • 1
1

In my case I wanted to check if a complete URL is encoded, so I already knew that the URL must contain the string https://, and what I did was to check if the string had the encoded version of https:// in it (https%3A%2F%2F) and if it didn't, then I knew it was not encoded:

//make sure $completeUrl is encoded
if (strpos($completeUrl, urlencode('https://')) === false) {
    // not encoded, need to encode it
    $completeUrl = urlencode($completeUrl);
}

in theory this solution can be used with any string that has characters that gets encoded, as long as you know part of the string (https:// in this example) will always exists in what you are trying to check.

Waqleh
  • 9,741
  • 8
  • 65
  • 103
0

I am using the following test to see if strings have been urlencoded:

if(urlencode($str) != str_replace(['%','+'], ['%25','%2B'], $str))

If a string has already been urlencoded, the only characters that will changed by double encoding are % (which starts all encoded character strings) and + (which replaces spaces.) Change them back and you should have the original string.

Let me know if this works for you.

Hoytman
  • 1,722
  • 2
  • 17
  • 29
0

I found.
The url is For Exapmle: https://example.com/xD?foo=bar&uri=https%3A%2F%2Fexample.com%2FxD
You need Found $_GET['uri'] is encoded or not:

preg_match("/.*uri=(.*)&?.*/", $_SERVER['REQUEST_URI'], $r);
if (isset($_GET['uri']) && urldecode($r['1']) === $r['1']) {
  // Code Here if url is not encoded
}
Veyis Aliyev
  • 313
  • 2
  • 11
-2

private static boolean isEncodedText(String val, String... encoding) throws UnsupportedEncodingException { String decodedText = URLDecoder.decode(val, TransformFetchConstants.DEFAULT_CHARSET);

    if(encoding != null && encoding.length > 0){
        decodedText = URLDecoder.decode(val, encoding[0]);
    }

    String encodedText =  URLEncoder.encode(decodedText);

    return encodedText.equalsIgnoreCase(val) || !decodedText.equalsIgnoreCase(val);

}
Lohith Ravi
  • 68
  • 2
  • 6