2

I want to implement a web service based on tornado which can provide the database query service for users. I used the pyodbc module to connect to database and do the query. In practice, I found that printing the query result would take a long time. That is to say, if I used the following code to print the query result

while 1:
    data = cursor.fetchone()
    if not data: break
    self.write(data + '\n')
    self.flush()

and the sql command is something like

select * from <a large dummy table>

tornado would not print the query result until the loop is over. And it takes a long time.

I wanna make use of tornado's non-blocking asynchronous feature to make other users also be able to use the web service even though the loop for printing the current user's query request is not finished yet.

So I write something like:

@tornado.web.asynchronous
def get(self):
    try:
        cnxn = pyodbc.connect(self.server, self.driver, self.table, self.uid, self.pwd)
    except Exception, e:
        print e
        return

    try:
        self.cur = cnxn.execute(self.sql)
    except Exception, e:
        print e
        return

    self.wait_for_query(callback=self.async_callback(self.on_finish))

def wait_for_query(self, callback):
    while 1:
       data = self.cur.fetchone()
       if not data: break
       self.write(data)
       self.flush()
    callback()

def on_finish(self):
    self.finish()

I read this post: Asynchronous COMET query with Tornado and Prototype and knew my solution wouldn't work. But I certainly cannot use add_timeout, 'cause there is no way for me to figure out how long an iteration would last. So how can I work through this to achieve my goal?

Community
  • 1
  • 1
Sheng
  • 1,697
  • 4
  • 19
  • 33

1 Answers1

0

In order to allow the single-threaded Tornado server to be asynchronous in a request like this you have to give control back to the I/O loop. Try this:

class LongRequestHandler(tornado.web.RequestHandler):
    def database_callback(self):
        data = self.cur.fetchone()
        if not data:
            self.finish()
            self.cnxn.close()
        else:
            self.write(data)
            self.flush()
            tornado.ioloop.IOLoop.instance().add_callback(self.database_callback)

    @tornado.web.asynchronous
    def get(self):
        try:
            self.cnxn = pyodbc.connect(self.server, self.driver, self.table, self.uid, self.pwd)
        except Exception, e:
            print e
            return

        try:
            self.cur = self.cnxn.execute(self.sql)
        except Exception, e:
            print e
            return

        tornado.ioloop.IOLoop.instance().add_callback(self.database_callback)

It's worth noting, however, that each database provider is different. It's my understanding that with MySQL, most of the time/processing will actually be spent on the execute() call rather than looping through the data, because MySQL processes the whole query and returns a complete result set. If you're using a database provider that does the same, you may need to process requests like this in a worker process that sits behind Tornado.

Edit My example is an example only. In reality, you'd want to test your callback and possibly loop through quite a few rows before returning, otherwise you waste an awful lot of CPU time switching between functions on the IO loop rather than actually processing the request. After doing some testing what I feared regarding MySQL is true - the execute/query statement itself is the one causing the lock, so this solution really won't help in that circumstance.

Kyle Johnson
  • 1,605
  • 2
  • 17
  • 26