2

I'm using mb_strtolower to make a string lowercase, but sometimes text contains urls with upper case. And when I use mb_strtolower, of course the urls changing and not working.

How can I convert string to lower without changin urls?

Okan Kocyigit
  • 13,203
  • 18
  • 70
  • 129
  • If your talking about web url's I thought case didn't matter. – Ash Burlaczenko Jun 12 '11 at 09:32
  • check this @ash - http://www.w3.org/TR/WD-html40-970708/htmlweb.html – Sujit Agarwal Jun 12 '11 at 09:35
  • check this @Coding Freak - http://www.w3.org/tr/wd-html40-970708/htmlweb.html it works. – Ash Burlaczenko Jun 12 '11 at 09:37
  • You'll have to probably use a regex, pull the url out and sequentially convert the leftover parts, putting the url back in when you get to where it previously resided in sequence. – kinakuta Jun 12 '11 at 09:38
  • @Ash - this is it what i wanted to point. Have you read the contents of the page? – Sujit Agarwal Jun 12 '11 at 09:39
  • @Ash - it depends on the platform its running on. http://students.washington.edu/jamdon/ or http://students.washington.edu/Jamdon/ – kinakuta Jun 12 '11 at 09:39
  • @Ash Burlaczenko espacially I have problem about youtube urls, for example http://www.youtube.com/watch?v=westzjv8zto http://www.youtube.com/watch?v=weStzJV8ZTo – Okan Kocyigit Jun 12 '11 at 09:41
  • URLs in general are case-sensitive (with the exception of machine names). There may be URLs, or parts of URLs, where case doesn't matter, but identifying these may not be easy. Users should always consider that URLs are case-sensitive. /u – Sujit Agarwal Jun 12 '11 at 09:42
  • Hey wait a second, take a look at my answer.. it's pretty much functional now =) – 19h Jun 12 '11 at 09:59

2 Answers2

1

Here you go, iterative, but as fine as possible.

    function strtolower_sensitive ( $input ) {
            $regexp = "#((http|https|ftp)://(\S*?\.\S*?))(\s|\;|\)|\]|\[|\{|\}|,|\"|'|:|\<|$|\.\s)#ie";
            if(preg_match_all($regexp, $input, $matches, PREG_SET_ORDER)) {
                    for( $i=0, $hist=array(); $i<=count($matches); ++$i ) {
                            str_replace( $u=$matches[$i][0], $n="sxxx".$i+1, $input ); $hist[]=array($u,$n);
                    }
                    $input = strtolower($input);
                    foreach ( $hist as $h ) {
                            str_replace ( $h[1], $h[0], $input );
                    }
            }
            return $input;
    }

$input is your string, $output will be your answer. =)

19h
  • 819
  • 9
  • 20
  • Incorrect, urls can be case sensitive, it depends on the webserver > http://scuzzy.id.au/TEST.html -vs- http://scuzzy.id.au/test.html – Scuzzy Jun 12 '11 at 09:41
  • Not all urls are case insensitive. – kinakuta Jun 12 '11 at 09:42
  • So what's problem with youtube urls, when I convert to lower, they are not working, example http://www.youtube.com/watch?v=westzjv8zto http://www.youtube.com/watch?v=weStzJV8ZTo – Okan Kocyigit Jun 12 '11 at 09:42
  • If you depend on the server software. Okay, I'll rewrite my response to make it more general. – 19h Jun 12 '11 at 09:43
  • @ocanal: youtubes' "v" parameter is case sensitive – Scuzzy Jun 12 '11 at 09:44
  • @Scuzzy so you have any idea, what can I do? – Okan Kocyigit Jun 12 '11 at 09:50
  • Urls can be made case insensitive by using triplets encoding. Servernames are case insensitive anyway I guess, however I wonder a bit about punycode. – hakre Jun 12 '11 at 09:50
  • @hakre: Punycode follows the domain-name conventions, so it's case insensitive. As for the triplets encoding, wouldn't that require cooperation from the server as well? – Piskvor left the building Jun 12 '11 at 10:15
  • @Poskvor: Thanks for the clarification on punycode. It actually requires that the HTTP URL is triplet encoded. Normally only the required chars are, so there would be needed some other processing to convert them prior to change the case of the overall string. I think you mean that "from the server". So yes. It was just a bit of a hit the edge-case to open the mind for another, alternative solution. – hakre Jun 12 '11 at 10:18
  • @hakre: aha, that could be a workaround, yes - but again, you'll need to apply it to URLs only, which is not very different from the original question :) – Piskvor left the building Jun 12 '11 at 11:09
  • @Piskvor: Jup. But hey, probably there is some sort of string filter available in PHP to do exactly that. After applying it, the original code could stay as is *gg* :) – hakre Jun 12 '11 at 11:13
1

Since you have not posted your string, this can be only generally answered.

Whenever you use a function on a string to make it lower-case, the whole string will be made lower-case. String functions are aware of strings only, they are not aware of the contents written within these strings specifically.

In your scenario you do not want to lowercase the whole string I assume. You want to lowercase only parts of that string, other parts, the URLs, should not be changed in their case.

To do so, you must first parse your string into these two different parts, let's call them text and URLs. Then you need to apply the lowercase function only on the parts of type text. After that you need to combine all parts together again in their original order.

If the content of the string is semantically simple, you can split the string at spaces. Then you can check each part, if it begins with http:// or https:// (is_url()?) and if not, perform the lowercase operation:

$text = 'your content http://link.me/now! might differ';
$fragments = explode(' ', $text);
foreach($fragments as &$fragment) {
    if (is_not_url($fragment)) 
        $fragment = strtolower($fragment) // or mb_strtolower
        ;
}
unset($fragment); // remove reference
$lowercase = implode(' ', $fragments);

To have this code to work, you need to define the is_not_url() function. Additionally, the original text must contain contents that allows to work on rudimentary parsing it based on the space separator.

Hopefully this example help you getting along with coding and understanding your problem.

hakre
  • 193,403
  • 52
  • 435
  • 836
  • Sure as I can not write an example parser for a string which's semantics are hidden. – hakre Jun 12 '11 at 10:04
  • Don't ask, don't tell. There're not too many semantics which can apply to php strings. – 19h Jun 12 '11 at 10:06
  • 2
    @kenansulayman: And? This is not a site of the type 'hand me the complete solution on a silver platter', the point is to give the asker a useful answer, not to do all his work for him. – Piskvor left the building Jun 12 '11 at 10:10
  • Don't ying, don't yang. That's crap. If you're looking for upvotes, you should not downvote only because your answer has not been accepted. It has a parser (based on regex) as well, so it's not that different in it's principle. Probably it misses some documentation to be understood by the OP. But I don't know that. – hakre Jun 12 '11 at 10:13