Options for accelerating Python code through parallelizing/ multiprocessing

Question

Below, I've gathered 4 ways to complete the execution of code that involves sorting updating Pandas Dataframes.

I would like to apply the best methods to speed up the code execution. Am I using the best available practices?

Would someone please share some thoughts on the following ideas?

I'm looping over the Data Frame because the process to solve my problem appears to call for it. Would there be a big increase in speed from using Dask Dataframes?
Can the Dask Distributed version benefit from setting a particular number workers, processes, threads per worker? People point out that increasing the number of processes instead of threads (or vice versa) is best for some cases.
What would be the most powerful hardware infrastructure to use for this kind of code? The Multiprocessing version is even faster on an AWS instance with more physical CPU cores.
- Would a Kubernetes/AWS setup with Dask Distributed be much faster?
- Could this be easily adapted to run with the help of a GPU locally or on a multi-GPU AWS instance?

These are the completion times for reference:

Regular 'For' loop: 34 seconds
Dask Delayed: 21 seconds
Dask Distributed (local machine): 21 seconds
Multiprocessing: 10 seconds

from dask.distributed import Client
from multiprocessing import Pool
from dask import delayed
import pandas as pd
import numpy as np
client = Client()
import random
import dask

#Setting original input data that will be used in the functions
alist=['A','B','C','D','E','F','G','H','I']
set_table=pd.DataFrame({"A":alist,
                        "B":[i for i in range(1,10)],
                        "C":[i for i in range(11,20)],
                        "D":[0]*9})

#Assembled random list of combinations   
criteria_list=[]
for i in range(0,10000):
    criteria_list.append(random.sample(alist,6))

#Sorts and filters the original df
def one_filter_sorter(criteria):
    sorted_table=set_table[set_table['A'].isin(criteria)]
    sorted_table=sorted_table.sort_values(['B','C'],ascending=True)
    return sorted_table

#Exists to help the function below. Simplified for this example    
def helper_function(sorted_table,idx):
    if alist.index(sorted_table.loc[idx,'A'])>5:
        return True

#last function that retuns the gathered result    
def two_go_downrows(sorted_table):

    for idx, row in sorted_table.iterrows():
        if helper_function(sorted_table,idx)==True:
            sorted_table.loc[idx,'D'] = 100 - sorted_table.loc[idx,'C']

    res=sorted_table.loc[:,['A','D']].to_dict()
    return res

#--Loop version
result=[]    
for criteria in criteria_list:
    A=one_filter_sorter(criteria)
    B=two_go_downrows(A)
    result.append(B)

#--Multiprocessed version
result=[]    
if __name__ == '__main__':
    pool=Pool(processes=6)
    A=pool.map(one_filter_sorter, criteria)
    B=pool.map(two_go_downrows, A) 
    result.append(B)

#--Delayed version
result=[]    
for criteria in criteria_list:
    A=delayed(one_filter_sorter)(criteria) 
    B=delayed(two_go_downrows)(A) 
    result.append(B)
dask.compute(result)

#--Distributed version
A= client.map(one_filter_sorter,criteria_list) 
B= client.map(two_go_downrows,A)
client.gather(B)

Thank you

You haven't said why you're using Dask in the first place. Can the DF not fit in memory? — roganjosh, Feb 19 '19 at 21:43
All of these functions rely on a loop in `two_go_downrows`. The first port-of-call should be trying to vectorize that functionality — roganjosh, Feb 19 '19 at 21:47
Thank you @roganjosh. I saw Dask being used in cuDF examples. I was trying to to find a way to do the operations on an AWS instance that offered more threads/processes than my local machine. I'm still new to this... — Kdog, Feb 19 '19 at 21:53
That's fine. If my laptop didn't keep freezing I might have some idea what you're trying to do here so I can have a play :) — roganjosh, Feb 19 '19 at 21:54
@roganjosh One point that was made in Dask tutorials is that you could parallelize (as they put it) "for loopy" Python code. Can you please point me to an example of how the a function like `two_go_downrows` could be vectorized. The only thing that rings a bell is something like `cuda.jit` — Kdog, Feb 19 '19 at 21:59
`jit` is not vectorization. I suspect someone else will get to answering this before me at this rate but I am trying to run the code, my laptop just isn't playing ball — roganjosh, Feb 19 '19 at 22:02
To make it run easier, try reducing this: `for i in range(0,10000)`. Also, yes the `criteria_list` doesn't fit into memory sometimes. I was surprised. — Kdog, Feb 19 '19 at 22:02

Options for accelerating Python code through parallelizing/ multiprocessing

0 Answers0