1

code:

import urlparse
url1 = 'http://try.github.io//levels/1/challenges/1'
netloc1 = urlparse.urlparse(url1)[1]  #try.github.io

url2 = 'https://github.com/explore'
netloc2 = urlparse.urlparse(url2)[1]  #github.com

netloc2 is I want,however,I hope netloc1 is github.io,if use regex,how to handle it.

liuzhijun
  • 4,329
  • 3
  • 23
  • 27
  • 1
    You've got a working Pythonic solution and want to write a regex to do the same - is that correct? – Jon Clements May 30 '13 at 09:42
  • 2
    The problem is that you need a list of TLDs to get this to work. For example, what would be the netloc in `foo.bar.com.br`? Opposed to `foo.bar.com`. There is no way to get this working for all TLDs without having a list of valid TLDs. – Wolph May 30 '13 at 09:55
  • 1
    It's not really clear what you're looking for from the question. Would you be able to expand the "given 'this' I'm expecting 'that'" portion? – Ro Yo Mi May 30 '13 at 12:44

1 Answers1

0

Description

This regex will validate the url's contain either try.github.io or gethub.com

^https?:[\/]{2}(try[.]github[.]io|github[.]com)

enter image description here

Example

I don't know python so I'm providing a php example to show how the regex works.

<?php
$sourcestring="your source string";
preg_match_all('/^https?:[\/]{2}(try[.]github[.]io|github[.]com)/im',$sourcestring,$matches);
echo "<pre>".print_r($matches,true);
?>

$matches Array:
(
    [0] => Array
        (
            [0] => http://try.github.io
            [1] => https://github.com
        )

    [1] => Array
        (
            [0] => try.github.io
            [1] => github.com
        )

)

Disclaimer

It would probably be easier to use your urlparse solution and then just apply some logic to test the [1] returned value.

Ro Yo Mi
  • 14,790
  • 5
  • 35
  • 43