3

The following python script takes 3 seconds on my PC to load the source code of a twitter page, which is much higher than it takes to retrieve the source code of other websites, such as youtube. When I load the same twitter page in my browser, the "network" tab in google chrome tells me the html is retrieved in 0.3 seconds.

Why is urllib so much slower than my browser?

import urllib2
import time

start=time.time()
channel='pontifex'
url="https://twitter.com/"+channel
page = urllib2.urlopen(url).read()
print str(round(time.time()-start,0))+" secs total"
Alexis Eggermont
  • 7,665
  • 24
  • 60
  • 93

1 Answers1

2

Caching is the answer and it is usually done by browsers to reduce the load time of frequently visited sites. If not the browser, then search engines such as Google also cache frequently visited websites so that retrieving them is only a matter of milliseconds

See this post: How can Google be so fast?

Community
  • 1
  • 1
smac89
  • 39,374
  • 15
  • 132
  • 179
  • Thanks. That is helpful. To speed things up, is there any way for urllib to access the same cache that my browser accesses? Also, I'm still wondering why twitter would be so much slower than all other websites I've previously scraped with urllib. 3 seconds is an eternity. – Alexis Eggermont Dec 24 '15 at 06:05
  • 1
    @AlexisEggermont Try using [urllib2](https://docs.python.org/2/library/urllib2.html) which is an updated version of urllib. See http://stackoverflow.com/questions/2018026/should-i-use-urllib-urllib2-or-requests?rq=1. – smac89 Dec 24 '15 at 06:21
  • I don't agree with this, I've been comparing Python's urllib get request to other languages, java, vba, c#, node.js and the performance is dreadful in comparison, something is definitely not right with urllib. – tremor Jun 25 '20 at 19:52