1

For example one of my entities has two sets of IDs. One that is continuous (which apparently is necessary to create the EntitySet), and one to use as a foreign key when merging with my other table.

This results in featuretools including the ID in the set of features to aggregate. SUM(ID) isn't a feature I am interested in though.

Is there a way to include certain feature when running deep feature synthesis?

selib
  • 39
  • 2

1 Answers1

2

There are three ways to exclude features when calling ft.dfs.

  • Use the ignore_variables to specify variables in an entity that should not be used to create features. It is a dictionary mapping an entity id to a list of variable names to ignore.

  • Use drop_contains to drop features that contain any of the strings listed in this parameter.

  • Use drop_exact to drop features that exactly match any of the strings listed in this parameter.

Here is a example usage of all three in a ft.dfs call

ft.dfs(target_entity="customers"],
       ignore_variables={
           "transactions": ["amount"],
           "customers": ["age", "gender", "date_of_birth"]
       }, # ignore these variables
       drop_contains=["customers.SUM("],  # drop features that contain these strings
       drop_exact=["STD(transactions.quanity)"],  # drop features named exactly this
       ...
 )

These 3 parameters are all documented here.

The final thing to consider if you are getting features you don't want is the variable types of the variables in your entity set. If you are seeing the sum of an ID variable that must mean that featuretools thinks the ID variable is a numeric value. If you tell featuretools it is an ID it will not apply a numeric aggregation to it.

Max Kanter
  • 2,006
  • 6
  • 16