1

Assume that I have a dataset consisting of single table, for instance you can consider titanic dataset on kaggle.

Now what is a proper way of using feature tools to get most benefit from it? as featuretools is specially for relational data.

now by 'proper' I mean, I know that when creating entityset the index parameter will be just index of the dataset but what should be my new index when normalizing the entity? also is it okay to use RFE blindly for feature selection?

1 Answers1

2

You can get the most benefit from Featuretools by normalizing the entity set. The more normalized an entity set can be, the greater DFS can leverage the relational structure to generate better features.

The objective of the normalization process is to eliminate redundant data. So, the new index with additional variables should be one that helps towards this objective. This guide goes into more depth on creating an entity from a de-normalized table.

For feature selection, I think RFE can be used judiciously with the objectives to improve the accuracy and reduce the complexity of a model.

Jeff Hernandez
  • 2,063
  • 16
  • 20
  • thanks for your answer. should I normalize all the variables? for example I made entity set with PassengerIds now when normalizing, I should normalize it again and again with all the columns? or I should select only one? if only one, what should it be? – Graphics Engineer Feb 22 '20 at 05:41
  • I would try normalizing more than one redundant variable like Ticket Class (`pclass`) and Port of Embarkation (`embarked`). This will require experimentation to see which columns yield the best results for your use case. – Jeff Hernandez Feb 24 '20 at 14:48