Split a URL in Python 2.x

Question

I have a link being parsed in some HTML code as below:-

"http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"

What I am looking to do is extract the second part of the code commencing with the second occurrence of http: so in the above case, I want to extract

"http://truelink.com/football/abcde.html?"

I have considered slicing the URL into segments however I am unsure of the structure will remain the same with the first part as time goes on.

Is it possible to identify the second occurrence of 'http' and then parse out the code from there to the end?

Just out of curiosity - how did you end up with such a string? :) — Jon Clements, Jun 06 '15 at 21:19

daniel451 · Answer 1 · 2015-06-06T21:23:49.140

3

link = "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"

link[link.rfind("http://"):]

returns:

"http://truelink.com/football/abcde.html?"

This is what I would do. rfind finds the last occurence of "http" and returns the index. This occurence obviously is the real, original url in your example. Then you can extract the substring beginning with that index until the end.

So if you have some string myStr a substring is extracted in python with an array-like expression:

myStr[0]    # returns the first character
myStr[0:5]  # returns the first 5 letters, so that 0 <= characterIndex < 5
myStr[5:]   # returns all characters from index 5 to the end of the string
myStr[:5]   # is the same like myStr[0:5]

edited Jun 06 '15 at 21:23

answered Jun 06 '15 at 21:11

daniel451

10,626
19
67
125

What if the URL is `"http://advert.com/go/2/12345/0/http://truelink.com/football/http"`? – vaultah Jun 06 '15 at 21:15
@ascenator and that should then be your actual answer :) – Jon Clements Jun 06 '15 at 21:16
Ah yeah, I got it. Edited :) – daniel451 Jun 06 '15 at 21:17
@ascenator also - if `http://` isn't found in the string - you'll get some interesting results :) – Jon Clements Jun 06 '15 at 21:20
This true, but which url does not supply `http://`? Especially when thefragileomen said he (for some reason) has ad-urls containing real urls. – daniel451 Jun 06 '15 at 21:26

score 0 · Answer 2 · answered Jun 06 '15 at 21:20

I would do something like this:

addr = "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"
httpPart = 'http://'
split = addr.split(httpPart)
res = []
for str in split:
    if (len(str) > 0):
        res.append(httpPart+str);
print res

Split a URL in Python 2.x

2 Answers2