How do we decide the list of primitives to pass into the deep feature synthesis in featuretools?
1 Answers
There are two overarching ways to go about it
- If you are computationally constrained, start with a few primitives and build towards a final primitive set.
- If the dataset is small (relative to the computer in question), we can choose many primitives to start and then lean on hyperparameter optimization and feature selection to prune the resulting feature matrix.
Building up
When datasets are big, DFS can take a long time to run on a personal computer. For every primitive we add, that primitive will be applied across to all valid columns across all valid relationships.
In that case, it’s helpful to add primitives more carefully than you otherwise might. It is especially important to check that primitives are creating meaningful and important features when every additional primitive adds a noticeable amount of time to the final calculation.
When building up, we roughly follow these steps
- Only use DFS on a small subset of your entityset so you can see results quickly
- Visually inspect generated features to ensure they make sense and are calculating what you believe they are calculating
- Check that any specific features you would like are being generated
- Use a model and score a feature matrix with more rows on a validation set to see which features seem promising and which do not
- Add and remove primitives, repeat
You can see the traces of this in the predict-remaining-useful-life demo. We only show 3 primitives in each notebook which was found after several iterations. By the second notebook of that demonstration, we trade one of our 3 primitives 'last'
for 'complexity'
from tsfresh) to generate 302 features. The 'complexity'
primitive creates 3 of the 5 most important features for our final model, which is substantially more accurate than our original.
This method saves the cost of computing with unnecessary primitives on the full dataset. The downside is that the results are particularly subjective. At every level you’re making choices about which primitives and features you like. This leads to personal bias and anecdotal evidence limiting the success of your eventual model. In order to avoid that, we need to use more computational resources.
Going big and pruning
An alternate approach, when computational time permits, is to start with a large feature matrix and work from that. In this paradigm, we would include every primitive that we want. From there, we would:
- Build full feature matrix with many primitives
- Test the results of various feature selection algorithms
- Visually inspect particularly good and particularly bad features to make sure nothing is wrong
- Add custom primitives based on the results
- Repeat
You can see the results of this approach in the predict-next-purchase demo. There, we use the default primitive set to generate 161 features for a dataframe with 12 columns. From those features, we choose our favorite 20 to use with the full dataset.
This takes more time but allows for more programmatic and reproducible exploration of the feature space. Because the number of features in the end result is so high, more emphasis is placed on the feature selection methodology while less care is needed while selecting primitives.
Finally, note that finding an optimal list of primitives to pass to Featuretools is one step removed from a very difficult question — “what is the best way choose features?” Passed into deep feature synthesis, a set of aggregation and transform primitives will deterministically generate a set of features. If you were to ask for the best subset of those features, you would get different answers depending on who you asked. The answer would be constrained by (listed in no particular order here):
- which algorithms are being used for selection,
- desire for interpretability of features and,
- which metric and model you’re trying to optimize.

- 2,006
- 6
- 16