I'm using featuretools Deep Feature Sintesys to build features for a dataset of 40k rows and 200 columns. I choose about 40 transformation primitivies, as you can see in the code bellow:
feature_matrix, feature_defs = ft.dfs(entityset=es, target_entity="df", n_jobs=6,
trans_primitives=primitives.name.to_list(),
verbose=True)
but when I run my code, It takes a lot of time to discover the features to build, and this process doesn't run in multiple cores in my CPU, and not even a single-core gets 100% of usage. In other words, I'm waiting hours to run a process that is just using the minimal resources of my machine (memory also is not a problem).
After the feature tools discover the features (and print a log "built n features") then it creates the cluster and uses all the cores specified in the "n_jobs" parameter, in 100% of capability. This second moment is really fast, just some seconds, once all my resources are being used.
My question is, why is this happening? It's possible to discover the features faster to reduce this time? And just don't understand how a process that doesn't use resources takes too long.