0

I've noticed that featuretools created features from my dataframe index. For example:

'LAST(transactions.payment_id)'

This is the index I set when creating the entity:

es = es.entity_from_dataframe(entity_id = 'transactions', 
                              dataframe = transactions,
                              make_index=True,
                              index = 'payment_id',
                              time_index = 'local_date')

What is the use in creating features from an index? And if there is no use in this, how can this be disabled? I trained a model overnight and found that payment ID was a very important feature which doesn't make sense.

SCool
  • 3,104
  • 4
  • 21
  • 49

1 Answers1

1

By default, the index is used to generate features. This can be avoided by using the drop_contains parameter. So, the DFS call would look something like this:

ft.dfs(
    ...
    drop_contains=['payment_id'],
)

Let me know if this helps.

Jeff Hernandez
  • 2,063
  • 16
  • 20
  • Yes I figured out the `drop_contains` parameter. But I'd still like to know why the `id` column is used to create features. Is there any scenario where this is useful? – SCool Nov 27 '19 at 11:21
  • I haven't come across a specific scenario. I'd imagine they can be used to build more entities. – Jeff Hernandez Nov 27 '19 at 17:31