2

I'm trying to apply certain function with two adjacent elements in given data set. Please refer to the following example.

# I'll just make a simple function here.
# In my real case, I send request to database 
# to get the result with two arguments.

def get_data_from_db_with(arg1, arg2):
    # write a query with arg1 and arg2 named 'query_result'
    return query_result

data = [arg1, arg2, arg3, arg4]
result = []
for a, b in zip(data, data[1:]):
    result.append(get_data_from_db_with(a, b))

For example if the length of data is 4 as is the case seen above, then I send request 3 times to database. Each request takes about 0.3 sec to retrieve the data, thus 0.9 second (0.3 sec * 3 requests) in total. Problem is that as the number of request increases, so does the overall time. What I want to do is, if possible, send all requests at once. Basically, it'll look like this.

With the code above,

1) get_data_from_db_with(arg1, arg2)
2) get_data_from_db_with(arg2, arg3)
3) get_data_from_db_with(arg3, arg4)

will be processed consecutively.


What I want to do, if possible, is to send requests all at once, not consecutively. Of course, the number of requests remains unchanged. But the overall time consumption will decrease based on my assumption.

Now I'm searching for async, multiprocessing and so on. Any comment or feedback would be immensely helpful.

Thanks in advance.

higee
  • 402
  • 1
  • 7
  • 16

2 Answers2

3

Threads is probably what you are looking for. Assuming that most of the job get_data_from_db_with does is waiting for i/o, like calling database.

import threading

def get_data_from_db_with(arg1, arg2):
    # write a query with arg1 and arg2 named 'query_result'
    current_thread = threading.current_thread()
    current_thread.result = query_result

data = [arg1, arg2, arg3, arg4]
threads = []
for a, b in zip(data, data[1:]):
    t = threading.Thread(target=get_data_from_db_with, args=(a,b))
    t.start()
    threads.append(t)

results = []
for t in threads:
    t.join()
    results.append(t.result)

Note that this solution even preserves the order in results list.

freakish
  • 54,167
  • 9
  • 132
  • 169
  • Thanks for your advice! I do have a question on using `threading`. As far as I know, Python prefers multiprocessing over multithread given GIL (global interpreter lock). I might be wrong, but was just curious. – higee Apr 15 '17 at 08:15
  • @GeeYeolNahm That totally depends on what you are trying to do. GIL is released on every i/o so as long as majority of time you do i/o (insted of cpu intensive tasks) then threads are prefered over processes. – freakish Apr 15 '17 at 10:55
  • I tried testing multi-threading and it worked!. It became, iin average, 2~3 times faster. Yeah you were right, multi-threading worked on this case with my working environment. Thanks again, freakish! – higee Apr 16 '17 at 16:59
1

An alternative to multiprocessing is to work on query construction itself. Look for ways to combine the queries something like, (arg1 and arg2) or (arg2 and arg3)..., essentially try to fetch all the required data in a single call.

Gautham Kumaran
  • 421
  • 5
  • 15
  • Thanks for sharing your thought. Yes I did search sending one request as you've mentioned. I'm still working on writing one-single query and parsing the result. I've been using [elasticserach multisearch API](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-multi-search.html). Above all, I do think sending one request outperforms sending multiple requests simultaneously! – higee Apr 15 '17 at 08:18