How to get content by urllib when link will be redirect to another?

Question

I'm trying to use urllib to get content from this url:"https://blockexplorer.com/block-index/0" . But when the browser load this link, it will be redirect to another link "https://blockexplorer.com/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f".

Here is my code:

import urllib

link = "https://blockexplorer.com/block-index/0"
f = urllib.urlopen(link)
myfile = f.read()
print myfile

But I get the message "Cannot GET /block-index/0". So could I get the content of page after parsed with block index as above.

Please help me solve this issue.

Thank a lot.

I'm confused do you want to get the contents of the page it redirects to? or of the page that does the redirection? — Mohammad Ali, Apr 02 '17 at 16:03
I think this is what you are looking for: http://stackoverflow.com/a/3556287/1699398 — jmercouris, Apr 02 '17 at 17:59
@MohammadAli : I want to get the final page that it come to "https://blockexplorer.com/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" — phuong, Apr 03 '17 at 03:52
@jmercouris: Follow your way, this just give me the orgigin url "https://blockexplorer.com/block-index/0". It's not that the url "https://blockexplorer.com/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" which contain the content that i wan — phuong, Apr 03 '17 at 03:57
@JameLenon why is it that when I visit the link I am not redirected? — Mohammad Ali, Apr 03 '17 at 05:01
@MohammadAli: when you go to https://blockexplorer.com/block-index/0, it will redirect to "https://blockexplorer.com/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f". Let try it — phuong, Apr 03 '17 at 08:26

score 1 · Answer 1 · answered Apr 03 '17 at 16:55

1

If you are willing to use the Python request module you could try the following code:

r = requests.get('https://blockexplorer.com/block-index/0', allow_redirects=True)

Which should give you the contents of the page after the request

answered Apr 03 '17 at 16:55

Mohammad Ali

878
8
16

score 0 · Answer 2 · answered Apr 03 '17 at 00:22

0

The site you are trying to crawl does not accept the header */* (default for urllib), but accept text/html. You can crawl with the following code:

import urllib2

link = "http://blockexplorer.com/block-index/0"
r = urllib2.Request(url=link)
r.add_header('Accept', 'text/html')
response = urllib2.urlopen(r)
print(response.read())

But i think you will have more problems later. The data is not printed at html, but dynamically retrieved via javascript (angularJS).

answered Apr 03 '17 at 00:22

Rafael

1,835
3
18
26

Thanks for your answer, but it's not work for me. I want to get the content same "https://blockexplorer.com/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f" page. – phuong Apr 03 '17 at 03:25
But did you receive an HTML or the same `Cannot GET /block-index/0` ? – Rafael Apr 03 '17 at 09:57
If you are still receiving the same `Cannot GET /block-index/0`, try to [urllib2.urlopen like Chrome](https://gist.github.com/rafaelhdr/a05984cea8f929f29a0c3eb173f8dcdc). – Rafael Apr 03 '17 at 10:07
No, It's not appear the error "Cannot GET /block-index/0". But the content is so strangle, not like as I want after page load as this page https://blockexplorer.com/block/000000000019d6689c085ae165831e934ff763ae46a2a6c172b3f1b60a8ce26f%22 – phuong Apr 03 '17 at 11:39
Not HTML, right? It is an strange code with many interrogation marks (`?`)? If is it, this happens because of the gzip (from my gist). Just remove the gzip part (new header is `r.add_header('Accept-Encoding', 'deflate, sdch, br')`. – Rafael Apr 03 '17 at 12:44

How to get content by urllib when link will be redirect to another?

2 Answers2