1

I want featuretools to create features based on time index and cutoff time that I have declared in my entity set.

I have a dataset with time variables as well as numerical and categorical variable. There is an ITEMID column, each ITEMID has 2 to 12 rows of data.

With Columns like start date and transaction dates, various numerical and categorical columns. The start date is same across all rows of a given ITEMID whereas transactions dates is different in each row.

Here is the code for entity set

# creating and entity set 
entity_set = ft.EntitySet(id = 'rem_dur')

# adding a dataframe 
entity_set.entity_from_dataframe(entity_id = 'enh', dataframe = dataset, index = 'unique_id'
,,variable_types = {'Start_Date': ft.variable_types.DatetimeTimeIndex})) 

#unique_id is just row number from 1 to number of rows in dataset


entity_set.normalize_entity(base_entity_id='enh', new_entity_id= 'categorical_vars', index = 'ITEMID', 
                             additional_variables = ['cat_var_1', 'cat_var_2'])

###cutoff date 
cutoff_df = dataset[["unique_id", "trans_date"]]
cutoff_df["trans_date"] = pd.to_datetime(cutoff_df["trans_date"])

##feature engg
feature_matrix_2, feature_names_2 = ft.dfs(entityset=entity_set
                                       ,target_entity = 'enh'
                                       ,max_depth = 2
                                       ,verbose = 1 
                                       ,ignore_entities = ['categorical_vars']
                                       ,ignore_variables =ignore_features_dict
                                       ,dask_kwargs={'cluster': cluster}
                                       ,cutoff_time=cutoff_df
                                      ,cutoff_time_in_index=False
                                       )

It's unable to generate any time series features. It's returning just all the features except the ones which are ignored.
Vikrant
  • 139
  • 1
  • 12

1 Answers1

0

When you create the entity, you need to indicate the time index using the time_index argument rather than specifying the variable type.

It should look like this

entity_set.entity_from_dataframe(entity_id='enh',
                                 dataframe=dataset,
                                 index='unique_id',
                                 time_index="Start_Date") 
Max Kanter
  • 2,006
  • 6
  • 16
  • stil it didn't create any feature using start date and the cutoff_df. I got this 4 feature.DAY(Start_Date)>, , , ] – Vikrant Apr 04 '19 at 04:42
  • I'm not sure I understand the problem. the cutoff_df doesn't change the features definitions, just the values you get when you calculate the features. if you are getting those features you list above, that means that Featuretools is making features using the start date. if you can provide a provide runnable code that shows the output you're getting vs the expected output, we may be able to better help. – Max Kanter Apr 05 '19 at 16:42