-1

I have a link being parsed in some HTML code as below:-

"http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"

What I am looking to do is extract the second part of the code commencing with the second occurrence of http: so in the above case, I want to extract

"http://truelink.com/football/abcde.html?"

I have considered slicing the URL into segments however I am unsure of the structure will remain the same with the first part as time goes on.

Is it possible to identify the second occurrence of 'http' and then parse out the code from there to the end?

Jon Clements
  • 138,671
  • 33
  • 247
  • 280
thefragileomen
  • 1,537
  • 8
  • 24
  • 40

2 Answers2

3
link = "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"

link[link.rfind("http://"):]

returns:

"http://truelink.com/football/abcde.html?"

This is what I would do. rfind finds the last occurence of "http" and returns the index. This occurence obviously is the real, original url in your example. Then you can extract the substring beginning with that index until the end.

So if you have some string myStr a substring is extracted in python with an array-like expression:

myStr[0]    # returns the first character
myStr[0:5]  # returns the first 5 letters, so that 0 <= characterIndex < 5
myStr[5:]   # returns all characters from index 5 to the end of the string
myStr[:5]   # is the same like myStr[0:5]
daniel451
  • 10,626
  • 19
  • 67
  • 125
0

I would do something like this:

addr = "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"
httpPart = 'http://'
split = addr.split(httpPart)
res = []
for str in split:
    if (len(str) > 0):
        res.append(httpPart+str);
print res
Thomas Weglinski
  • 1,094
  • 1
  • 10
  • 21