0

Basically, have a small aiohttp app, which receives list of Impala queries and then sends them Impala. However some of the queries may take long time to complete, so decided to do it in async/parallel way. Got one solution with Threads working, but would love to see if it is possible to achieve same speed using asyncio/tornado only.

My code as following:

async def run(self, queries):
    # Here I validate queries
    query_list    = await self.build_query_list(split_queries)        # Format: [[queries for connection_1], [queries for connection_2], ...]

    start         = time.time()
    # Assing group of queries to each connection and wait results
    result_queue = deque()
    await multi([self.execute_impala_query(connection.connection, query_list[index], result_queue) for index, connection in enumerate(connection_list)])

    # Close all connections
    [await self.impala_connect_advance_pool.release_connection(connection) for connection in connection_list]

    # Wait for Impala responses
    while len(result_queue) < connect_limit: 
        continue

    # Send results back


async def execute_impala_query(self, impala_connect, queries, queue):
    return await multi([self.impala_response_to_json_response(impala_connect.cursor(), query, queue) for query in queries])

async def impala_response_to_json_response(self, impala_cursor, query, queue):
    self.logger.info('execute query: {}'.format(query))
    print ('execute query: {}'.format(query))

    def get_results():
        impala_cursor.execute(query)
        results = as_pandas(impala_cursor)
        impala_cursor.close()
        self.logger.info('{} completed'.format(query))
        print ('{} completed'.format(query))
        queue.append(results.to_json(orient='records'))

    IOLoop.current().spawn_callback(get_results)

What happens is that once its run, I can see 'execute query: query' messages being printed in the stdout and I'd have assumed that they are all being fired and are executing, however, it takes 2(or more) as long as the version with Threads. Am I getting the whole async concept wrong or got some silly mistake somewhere in the methods?

JDRussia
  • 65
  • 1
  • 10

1 Answers1

0

whole async concept wrong Yes, just by invoking a function with spawn_callback won't make it async: your DB connector should support async IO. And as I can see it is: I'd advise you to take a look at the execute_async method. Then you need to write your own waiting function like Impyla's _wait_to_finish, but with tornado.gen.sleep instead of time.sleep().

Fine
  • 2,114
  • 1
  • 12
  • 18
  • Oh, I see. I knew I need to do something like that, but hoped for the best :) Will give it a try then. Thanks for pointing that out for me :) – JDRussia Sep 04 '18 at 11:01