You can do multiple parallel requests like so using multithreading:
import Queue
import threading
import time
import requests
exit_flag = 0
class RequestThread(threading.Thread):
def __init__(self, thread_id, name, q):
threading.Thread.__init__(self)
self.thread_id = thread_id
self.name = name
self.q = q
def run(self):
print("Starting {0:s}".format(self.name))
process_data(self.name, self.q)
print("Exiting {0:s}".format(self.name))
def process_data(thread_name, q):
while not exit_flag:
queue_lock.acquire()
if not qork_queue.empty():
data = q.get()
queue_lock.release()
print("{0:s} processing {1:s}".format(thread_name, data))
response = requests.get(data)
print(response)
else:
queue_lock.release()
time.sleep(1)
thread_list = ["Thread-1", "Thread-2", "Thread-3"]
request_list = [
"https://api.github.com/events",
"http://api.plos.org/search?q=title:THREAD",
"http://api.plos.org/search?q=title:DNA",
"http://api.plos.org/search?q=title:PYTHON",
"http://api.plos.org/search?q=title:JAVA"
]
queue_lock = threading.Lock()
qork_queue = Queue.Queue(10)
threads = []
thread_id = 1
# Create new threads
for t_name in thread_list:
thread = RequestThread(thread_id, t_name, qork_queue)
thread.start()
threads.append(thread)
thread_id += 1
# Fill the queue
queue_lock.acquire()
for word in request_list:
qork_queue.put(word)
queue_lock.release()
# Wait for queue to empty
while not qork_queue.empty():
pass
# Notify threads it's time to exit
exit_flag = 1
# Wait for all threads to complete
for t in threads:
t.join()
print("Exiting Main Thread")
Output:
Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-1 processing https://api.github.com/events
Thread-2 processing http://api.plos.org/search?q=title:THREAD
Thread-3 processing http://api.plos.org/search?q=title:DNA
<Response [200]>
<Response [200]>
<Response [200]>
Thread-2 processing http://api.plos.org/search?q=title:PYTHON
Thread-3 processing http://api.plos.org/search?q=title:JAVA
Exiting Thread-1
<Response [200]>
<Response [200]>
Exiting Thread-3
Exiting Thread-2
Exiting Main Thread
A little explanation although I'm no multithreading expert:
1.Queue
The Queue module allows you to create a new queue object that can hold a specific number of items. There are following methods to control the Queue:
- get() − removes and returns an item from the queue.
- put() − adds an item to a queue.
qsize() − returns the number of items that are currently in the queue.
- empty() − returns True if queue is empty; otherwise, False.
- full() − returns True if queue is full; otherwise, False.
For my little experience with multithreading, this is useful to control what data you have still to process. I had situations where threads were doing the same thing or all exited except one. This helped me to control shared data to process.
2.Lock
The threading module provided with Python includes a simple-to-implement locking mechanism that allows you to synchronize threads. A new lock is created by calling the Lock()
method, which returns the new lock.
A primitive lock is in one of two states, “locked” or “unlocked”. It
is created in the unlocked state. It has two basic methods, acquire()
and release(). When the state is unlocked, acquire() changes the state
to locked and returns immediately. When the state is locked, acquire()
blocks until a call to release() in another thread changes it to
unlocked, then the acquire() call resets it to locked and returns. The
release() method should only be called in the locked state; it changes
the state to unlocked and returns immediately. If an attempt is made
to release an unlocked lock, a ThreadError will be raised.
To more human language locks are the most fundamental synchronization mechanism provided by the threading module. At any time, a lock can be held by a single thread, or by no thread at all. If a thread attempts to hold a lock that’s already held by some other thread, execution of the first thread is halted until the lock is released.
Locks are typically used to synchronize access to a shared resource. For each shared resource, create a Lock object. When you need to access the resource, call acquire to hold the lock (this will wait for the lock to be released, if necessary), and call release to release it.
3.Thread
To implement a new thread using the threading module, you have to do the following:
- Define a new subclass of the Thread class.
- Override the init(self [,args]) method to add additional arguments.
- Then, override the run(self [,args]) method to implement what the thread should do when started.
Once you have created the new Thread subclass, you can create an instance of it and then start a new thread by invoking the start(), which in turn calls run() method. Methods:
- run() − method is the entry point for a thread.
- start() − method starts a thread by calling the run method.
- join([time]) − waits for threads to terminate.
- isAlive() − method checks whether a thread is still executing.
- getName() − returns the name of a thread.
- setName() − sets the name of a thread.
Is it really faster?
Using single thread:
$ time python single.py
Processing request url: https://api.github.com/events
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:THREAD
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:DNA
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:PYTHON
<Response [200]>
Processing request url: http://api.plos.org/search?q=title:JAVA
<Response [200]>
Exiting Main Thread
real 0m22.310s
user 0m0.096s
sys 0m0.022s
Using 3 threads:
Starting Thread-1
Starting Thread-2
Starting Thread-3
Thread-3 processing https://api.github.com/events
Thread-1 processing http://api.plos.org/search?q=title:THREAD
Thread-2 processing http://api.plos.org/search?q=title:DNA
<Response [200]>
<Response [200]>
<Response [200]>
Thread-1 processing http://api.plos.org/search?q=title:PYTHON
Thread-2 processing http://api.plos.org/search?q=title:JAVA
Exiting Thread-3
<Response [200]>
<Response [200]>
Exiting Thread-1
Exiting Thread-2
Exiting Main Thread
real 0m11.726s
user 0m6.692s
sys 0m0.028s
Using 5 threads:
time python multi.py
Starting Thread-1
Starting Thread-2
Starting Thread-3
Starting Thread-4
Starting Thread-5
Thread-5 processing https://api.github.com/events
Thread-1 processing http://api.plos.org/search?q=title:THREAD
Thread-2 processing http://api.plos.org/search?q=title:DNA
Thread-3 processing http://api.plos.org/search?q=title:PYTHONThread-4 processing http://api.plos.org/search?q=title:JAVA
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
<Response [200]>
Exiting Thread-5
Exiting Thread-4
Exiting Thread-2
Exiting Thread-3
Exiting Thread-1
Exiting Main Thread
real 0m6.446s
user 0m1.104s
sys 0m0.029s
Almost 4 times faster for 5 threads. And those are only some 5 dummy requests. Imagine for a bigger chunk of data.
Please note: I've only have tested it under python 2.7 For python 3.x minor adjustments are probably needed.