Basically, have a small aiohttp app, which receives list of Impala queries and then sends them Impala. However some of the queries may take long time to complete, so decided to do it in async/parallel way. Got one solution with Threads working, but would love to see if it is possible to achieve same speed using asyncio/tornado only.
My code as following:
async def run(self, queries):
# Here I validate queries
query_list = await self.build_query_list(split_queries) # Format: [[queries for connection_1], [queries for connection_2], ...]
start = time.time()
# Assing group of queries to each connection and wait results
result_queue = deque()
await multi([self.execute_impala_query(connection.connection, query_list[index], result_queue) for index, connection in enumerate(connection_list)])
# Close all connections
[await self.impala_connect_advance_pool.release_connection(connection) for connection in connection_list]
# Wait for Impala responses
while len(result_queue) < connect_limit:
continue
# Send results back
async def execute_impala_query(self, impala_connect, queries, queue):
return await multi([self.impala_response_to_json_response(impala_connect.cursor(), query, queue) for query in queries])
async def impala_response_to_json_response(self, impala_cursor, query, queue):
self.logger.info('execute query: {}'.format(query))
print ('execute query: {}'.format(query))
def get_results():
impala_cursor.execute(query)
results = as_pandas(impala_cursor)
impala_cursor.close()
self.logger.info('{} completed'.format(query))
print ('{} completed'.format(query))
queue.append(results.to_json(orient='records'))
IOLoop.current().spawn_callback(get_results)
What happens is that once its run, I can see 'execute query: query' messages being printed in the stdout and I'd have assumed that they are all being fired and are executing, however, it takes 2(or more) as long as the version with Threads. Am I getting the whole async concept wrong or got some silly mistake somewhere in the methods?