There are three ways to exclude features when calling ft.dfs
.
Use the ignore_variables
to specify variables in an entity that should not be used to create features. It is a dictionary mapping an entity id to a list of variable names to ignore.
Use drop_contains
to drop features that contain any of the strings
listed in this parameter.
Use drop_exact
to drop features that exactly match any of the strings listed in this parameter.
Here is a example usage of all three in a ft.dfs
call
ft.dfs(target_entity="customers"],
ignore_variables={
"transactions": ["amount"],
"customers": ["age", "gender", "date_of_birth"]
}, # ignore these variables
drop_contains=["customers.SUM("], # drop features that contain these strings
drop_exact=["STD(transactions.quanity)"], # drop features named exactly this
...
)
These 3 parameters are all documented here.
The final thing to consider if you are getting features you don't want is the variable types of the variables in your entity set. If you are seeing the sum of an ID variable that must mean that featuretools thinks the ID variable is a numeric value. If you tell featuretools it is an ID it will not apply a numeric aggregation to it.