0

I'm trying to query ~50 Wikipedia pages. I've been using the requests package to make GET requests, but I've been working on implementing grequests as I hear it has much better performance.

The performance improvement is really quite minimal for me. Am I doing something wrong?

import requests
import grequests
from urllib.parse import quote
from time import time

url = 'https://en.wikipedia.org/w/api.php?action=query&titles={0}&prop=pageprops&ppprop=disambiguation&format=json'
titles = ['Harriet Tubman', 'Car', 'Underground Railroad', 'American Civil War', 'Kate Larson']
urls = [url.format(quote(title)) for title in titles]

def sync_test(urls):
    results = []
    s = time()
    for url in urls:
        results.append(requests.get(url))
    e = time()
    return e-s

def async_test(urls):
    s = time()
    results = grequests.map((grequests.get(url) for url in urls))
    e = time()
    return e-s

def iterate(urls, num):
    sync_time = 0
    async_time = 0
    for i in range(num):
        sync_time += sync_test(urls)
        async_time += async_test(urls)
    print("sync_time: {}\nasync_time: {}".format(sync_time, async_time))

output: sync_time: 8.945282936096191 async_time: 7.97578239440918

Thanks!

John Kim
  • 15
  • 2
  • I see no question here. Do you want us to do a general analysis of `grequests` vs `requests` performance? Make your program ?x times faster because you "heard" something? Speaking of which, 8s for 50 pages looks adequate performance to me unless you're planning to process tens of thousands. – ivan_pozdeev Jun 23 '17 at 22:18
  • There are also dedicated libraries to write Wikipedia bots, you may be better off not reinventing the wheel here. – ivan_pozdeev Jun 23 '17 at 22:23

1 Answers1

0
import requests
import grequests
from urllib.parse import quote
from time import time

url = 'https://en.wikipedia.org/w/api.php?action=query&titles={0}&prop=pageprops&ppprop=disambiguation&format=json'
titles = ['Harriet Tubman', 'Car', 'Underground Railroad', 'American Civil War', 'Kate Larson']
urls = [url.format(title) for title in titles]

def sync_test(urls):
    results = []
    s = time()
    for url in urls:
        results.append(requests.get(url))
    e = time()
    return e-s

def async_test(urls):
    s = time()
    results = grequests.map((grequests.get(url) for url in urls))
    e = time()
    return e-s

def iterate(urls, num):
    sync_time = 0
    async_time = 0
    for i in range(num):
        sync_time += sync_test(urls)
        async_time += async_test(urls)
    print("sync_time: {}\nasync_time: {}".format(sync_time, async_time))

if __name__ == '__main__':
    iterate(urls,10)

This yields me:

sync_time: 22.14458918571472
async_time: 4.846134662628174

Process finished with exit code 0

I don't see any problem here

dgan
  • 1,349
  • 1
  • 15
  • 28