0

I am working on a projects that involves making many requests to an api and for each feedback I am making a decision and saving in the db. I am using adbapi to communicate to mysql.

I am receiving the request as a POST containing a list of items that are to be pushed to a remote api and saved.

I have noted that while processing the items in a deferred all the other operations block till one part is done.

The following is an examples that shows something similar to what I am doing.

#!/usr/bin/python2.7

from twisted.web.server import Site
from twisted.web.resource import Resource
from twisted.internet import reactor, defer
from twisted.web.server import NOT_DONE_YET

from utils import send_mail, save_in_db


def get_params(request):
    params = {}
    for k, v in request.args.items():
        if k and v:
            params[k] = v[0]
    return params


class SendPage(Resource):

    def render_POST(self, request):
        params = get_params(request)
        emails = params['emails']
        message = params['message']
        self.process_send_mail(message, emails)
        request.write('Received')
        request.finish()
        return NOT_DONE_YET

    def process_send_mail(self, message, emails):
        defs = []
        for email in emails:
            d = send_mail(email, message)
            defs.append(d)
        d1 = defer.DeferredList(defs)
        d1.addCallback(self.process_save)

    def process_save(self, result):
        defs = []
        for r in result:
            d = save_in_db(r)
            defs.append(d)
        d1 = defer.DeferredList(defs)
        d1.addCallback(self.post_save)

    def post_save(self, result):
        print "request was completed"


root = Resource()
root.putChild("", SendPage())
factory = Site(root)
reactor.listenTCP(8880, factory)
reactor.run()

In the above examples, when I have a lot of emails in the list like 100,000 when I am doing send_mail it blocks other operations till its finished. If I try sending another request while that is happening, it blocks till after its done.

My question is, is there a way I can have the operations happen concurrently? Can I send_mail and in a concurrent way save_in_db? can I do that as I receive other requests and handle without having to wait for each other to finish?

Eutychus
  • 442
  • 8
  • 12

3 Answers3

0

You can just omit waiting for results or wait for all results: sending and saving to database like so:

def process_send_mail(self, message, emails):
    defs = []
    for email in emails:
        d = send_mail(email, message)
        defs.append(d)
        d = save_in_db(email)
        defs.append(d)

    d1 = defer.DeferredList(defs)
    d1.addCallback(self.post_save)      

def post_save(self):
    print "request was completed"
  • But that result you are looping in `for r in result` is not defined above? i will have to get result from send_mail so that I can use it. I have noted deferreds waits for all mails to be sent. I would want a way to process each mail and save in db instead of waiting for all. – Eutychus Sep 20 '16 at 14:23
  • Fixed. I do not know what returns `send_mail`. Used email to pass it to database. I assume that is the parameter to pass to `save_in_db`. – Dariusz Bączkowski Sep 20 '16 at 21:12
0

One trick I've leveraged in the past is a combination of inlineCallbacks and yield. Basically, you can iterate n number of elements then yield or pause at a given interval so that the reactor can do some other tasks. So in your case, you would decorate all the functions which have potentially blocking loops with @inlineCallbacks, enumerate the loop, then yield/pause at a certain point to give control back to the reactor.

@defer.inlineCallbacks
def process_send_mail(self, message, emails):
    defs = []
    for i, email in enumerate(emails):    # enumerate
        d = send_mail(email, message)
        defs.append(d)
        if i % 1000 == 0:
            yield    # pause every 1000 elements
    d1 = defer.DeferredList(defs)
    d1.addCallback(self.process_save)

You'll have to tweak the interval value to fit your needs as the value will depend on how fast results can be produced. Hope this helps.

notorious.no
  • 4,919
  • 3
  • 20
  • 34
0

there're actually two questions; I'll address them separately.

The first is: "Is there a way I can have the operations happen concurrently? Can I send_mail and in a concurrent way save_in_db"?

The answer is: yes and no. You can't do that concurrently, because as far as I can tell saving the data in the DB requires some result from the mail sending. But if you meant: can I start saving things in the DB as soon as I get the first mail result, without waiting for ALL mail results to come before saving things in the DB - yes, you can do that; just combine your two processing functions into one:

def process_send_mail_and_save(self, message, emails):
    defs = []
    for email in emails:
        d = send_mail(email, message)
        # might require tuning for save_in_db parameters if not matching send_mail callback output
        d.addCallback(save_in_db)
        defs.append(d)
    d1 = defer.DeferredList(defs)
    d1.addCallback(self.post_save)

2) "can I do that as I receive other requests and handle without having to wait for each other to finish?"

Of course you can do that in Twisted. But you must write your code in the right way. You don't tell us what send_mail or save_in_db do - I suppose you wrote them, and I suppose that THOSE functions are blocking and causing most of your issues - maybe send_mail does all the SMTP work and only when it has finished it returns? It should return the deferred immediately, and callback when the job has finished:

http://twistedmatrix.com/documents/16.4.0/core/howto/clients.html

I suggest you put logging calls with timestamps around the send_mail and save_in_db functions - around the moment you CALL them, not the moment their deferred fires.

Remember: the whole point of Twisted's deferreds is that deferred are returned IMMEDIATELY without blocking, while the callback you associate to them fires later on, when something is executed. If ANYTHING blocks ANYWHERE, Twisted can do nothing - it's single threaded, basically a cooperative multitasking. But Twisted can't turn your code into non-blocking magically - YOU must do it.

Sidenote: the way you're using server.NOT_DONE_YET is pointless. Just return "Received" as a string and forget the request object. You employ NOT_DONE_YET when calling request.finish() somewhere else, not immediately.

Alan Franzoni
  • 3,041
  • 1
  • 23
  • 35
  • Allan, I am not doing any blocking call in my code. Both the send_mail and save_in_db are doing no blocking calls returns a defered. As I said the issue is noted when requests are many(like 50k requests). I have edited my code to be the way you suggested where I start saving as soon as I get response from the send_mail function but I still noted that the save method will only start after all the requests for send_mail have deferred, which can take noticeable time and the reactor doesn't do anything else during that time. – Eutychus Sep 27 '16 at 14:54
  • On your sidenote about NOT_DONE_YET, when I don't return it (e.g return 'Received' like you suggested), I am getting an exception (`exceptions.RuntimeError: Request.write called on a request after Request.finish was called`). Is that how it should work? – Eutychus Sep 28 '16 at 08:22
  • Yes. You should do nothing with the request. No write(), no finish() - just return "Received". – Alan Franzoni Sep 28 '16 at 12:03