4

I have function that converts non numerical data in a dataframe to numerical.

import numpy as np
import pandas as pd
from concurrent import futures

def convert_to_num(df):
  do stuff
  return df

I am wanting to use the futures library to speed up this task. This is how I am using the library:

with futures.ThreadPoolExecutor() as executor:
    df_test = executor.map(convert_to_num,df_sample)

First I do not see the variable df_test being created and second when I run df_test in I get this message:

<generator object Executor.map.<locals>.result_iterator at >

What am I doing wrong to not be able to use the futures library? Can I only use this library to iterate values into a function versus passing a entire dataframe to be edited?

RustyShackleford
  • 3,462
  • 9
  • 40
  • 81

1 Answers1

1

The map method for the executor object, as per the documentation, takes the following arguments, map(func, *iterables, timeout=None, chunksize=1)

From your example you only provide a single df (the df_sample) but you could provide a list of df_samples which are unpacked in as the iterables parameter.

For example, Let us create a list of dataframes,

import concurrent.futures
import pandas as pd

df_samples = [pd.DataFrame({f"col{j}{i}": [j,i] for i in range(1,5)}) for j in range(1,5)]

Which would look like, df_samples

And now we add a function which will add an additional column to a df,

def add_x_column(df):
    df['col_x'] = ['a', 'b']
    return df

and now use the ThreadPoolExecutor to apply this function to the df_samples list in a concurrent manner. You would also need to make convert the generator object to a list to access the changed df's

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(add_x_column, df_samples))

Where the results would be the list of the resultant df's

Where the results would look like, df_results

buddemat
  • 4,552
  • 14
  • 29
  • 49
Ireonus
  • 50
  • 4