I am running featuretools to create new features and have created the entitysets from existing dataframe.
The dataframe for training has ~233K records and 81 columns which is split into 3 entities and provided as an input argument to es.dfs command which takes about 2.5 hours of execution time on train dataset and 1.5 hours on test dataset. The test data set size is ~120K with 80 columns.
How can I improve the performance in terms of reducing time to execute? I am running the code on Kaggle Kernel and I lose nearly 4+ hours out of the 9 hours available for a session just running the es.dfs command.
I have referred the code on featuretools website on parallel processing and speeding up the code but it is not very clear on how to go about doing it when entities are created from a dataframe or may be I am not understanding it very clearly.
Execution time reduction by 1/4th time.