I'm working on a python 2.7 script that must check in a Fedora Commons repository for the existence of some data in 20'000 objects. Basically this means sending 20'000 HTTP requests to 20'000 different urls on the repository (that runs on a Tomcat server).
I wrote a script that does the job, but I've been warned by the server system administrator that it opens too many network connections, which causes some troubles.
My script uses so far urllib2 to make the HTTP requests.
response = urllib2.urlopen(url)
response_content = response.read()
And actually this code opens one new network connection per request.
I have tried to use other libraries to make the requests, but could not find any way to reuse the same connection for all requests. Both solutions below still open many network connections, even if their number is really lower (actually both solutions seem to open one connection for 100 HTTP requests, which is still around 200 connections in my case).
httplib:
url = "http://localhost:8080/fedora/objects/test:1234?test="
url_infos = urlparse(url)
conn = httplib.HTTPConnection(url_infos.hostname + ":" + str(url_infos.port))
for x in range(0, 20000):
myurl = url + str(x)
conn.request("GET", myurl)
r = conn.getresponse()
response_content = r.read()
print x, "\t", myurl, "\t", r.status
requests:
url = "http://localhost:8080/fedora/objects/test:1234?test="
s = requests.Session()
for x in range(0, 20000):
myurl = url + str(x)
r = s.get(myurl)
response_content = r.content
print x, "\t", myurl, "\t", r.status_code
Even if the number of connections is much better, ideally I'd like to use one or very few connections for all requests. Is that even possible ? Is this number of 100 requests per connection related to the system or to the server ? By the way I also tried to make the requests pointing to an Apache server and the result was the same.