I am creating a featuretools matrix, that is generated out of 5 dataframe entities and one cutoff_time table. When I use the ft.dfs() function, I use both agg_primitives and trans_primitives, however all the primitives in trans_primitives that relate to a datetime column do not generate any feature.
The entity that includes the datetime column is called 'events'. The name of the column is 'event-timestamp'.
Since my trans_primitives list includes other primitives that do generate features ("IS_NULL" works), I assume that the problem is not with how I use the trans_primitives as a whole, but only those that relate to time.
Some things that may assist:
I checked the dtype of the column 'event-timestamp' in 'events' and it is datetime64[ns]. The same is with the 'cutoff_time' column in the cutoff table.
Another detail is that some new features of 'event-timestamp' are generated by the agg_primitives (for example 'MIN(matcher.devices.TIME_SINCE_LAST(events.event-timestamp))'), so I guess it shows that the column itself is OK.
I did some experiments with the es.entity_from_dataframe of 'events':
- Used the argument: time_index='event-timestamp'
- Used the argument: variable_types={'event-timestamp': vtypes.Datetime}
- Used both of the above and none of them
Below is the functions that I am using:
def generate_feature_matrix(events, grns, contracts, om_table, matcher, customers):
"""
The function takes a set of tables, creates featuretools entities and
relationships and then creates the final feature matrix"""
## Make empty entityset
es = ft.EntitySet(id = 'contracts_customers')
## Create entities
# events
es.entity_from_dataframe(entity_id='events', dataframe=events, index='index', make_index=True,
time_index='event-timestamp') # tried also variable_types={'event-timestamp': vtypes.DatetimeTimeIndex}
# Devices
es.entity_from_dataframe(entity_id='contracts', dataframe=contracts, index='contract')
# Matcher
es.entity_from_dataframe(entity_id='matcher', dataframe=matcher, index = 'contract',
make_index=False)
# os_table
es.entity_from_dataframe(entity_id='om_table', dataframe=om_table, index='index',
make_index=True)
# Users
es.entity_from_dataframe(entity_id='customers', dataframe=customers, index='customer')
# Relationships (parent, child)
r_devices_matcher = ft.Relationship(es['contracts']['contract'], es['matcher']['contract'])
r_devices_events = ft.Relationship(es['contracts']['contract'], es['events']['contract'])
r_devices_os = ft.Relationship(es['contracts']['contract'], es['om_table']['contract'])
r_users_matcher = ft.Relationship(es['customers']['customer'], es['matcher']['customer'])
es.add_relationships([r_devices_matcher, r_devices_events, r_users_matcher, r_devices_os])
# Primitives
agg_primitives=["num_unique", "skew", "mean", "count", "median", "sum",
"time_since_last", "mode", "min"]
trans_primitives=['month', 'weekday','hour', "time_since", "time_since_previous",
'is_null']
# Generate the features
feature_defs = ft.dfs(entityset=es, target_entity='customers',
cutoff_time = grns,
agg_primitives = agg_primitives,
trans_primitives = trans_primitives,
max_depth = 3, features_only = True,
chunk_size = len(grns),
)
return feature_defs
The entity relationship looks like that:
os
Out[392]:
Entityset: contracts_customers
Entities:
events [Rows: 22, Columns: 3]
contracts [Rows: 35, Columns: 2]
matcher [Rows: 2663, Columns: 2]
om_table [Rows: 965, Columns: 4]
customers [Rows: 76, Columns: 2]
Relationships:
matcher.contract -> contracts.contract
events.contract -> contracts.contract
matcher.customer -> customers.customer
om_table.contract -> contracts.contract
And the generated feature list:
new_features
Out[393]:
[<Feature: n_contracts>,
<Feature: NUM_UNIQUE(matcher.contract)>,
<Feature: MODE(matcher.contract)>,
<Feature: IS_NULL(customer)>,
<Feature: IS_NULL(n_contracts)>,
<Feature: SKEW(matcher.contracts.n_event)>,
<Feature: MEAN(matcher.contracts.n_event)>,
<Feature: MEDIAN(matcher.contracts.n_event)>,
<Feature: SUM(matcher.contracts.n_event)>,
<Feature: MIN(matcher.contracts.n_event)>,
<Feature: IS_NULL(NUM_UNIQUE(matcher.contract))>,
<Feature: IS_NULL(MODE(matcher.contract))>,
<Feature: NUM_UNIQUE(matcher.contracts.MODE(matcher.customer))>,
<Feature: NUM_UNIQUE(matcher.contracts.MODE(om_table.om_family))>,
<Feature: SKEW(matcher.contracts.COUNT(events))>,
<Feature: SKEW(matcher.contracts.TIME_SINCE_LAST(events.event-timestamp))>,
<Feature: SKEW(matcher.contracts.NUM_UNIQUE(matcher.customer))>,
<Feature: SKEW(matcher.contracts.NUM_UNIQUE(om_table.om_family))>,
<Feature: SKEW(matcher.contracts.SKEW(om_table.n_events))>,
<Feature: SKEW(matcher.contracts.MEAN(om_table.n_events))>,
<Feature: SKEW(matcher.contracts.COUNT(om_table))>,
<Feature: SKEW(matcher.contracts.MEDIAN(om_table.n_events))>,
<Feature: SKEW(matcher.contracts.SUM(om_table.n_events))>,
<Feature: SKEW(matcher.contracts.MIN(om_table.n_events))>,
<Feature: MEAN(matcher.contracts.COUNT(events))>,
<Feature: MEAN(matcher.contracts.TIME_SINCE_LAST(events.event-timestamp))>,
<Feature: MEAN(matcher.contracts.NUM_UNIQUE(matcher.customer))>,
<Feature: MEAN(matcher.contracts.NUM_UNIQUE(om_table.om_family))>,
<Feature: MEAN(matcher.contracts.SKEW(om_table.n_events))>,
<Feature: MEAN(matcher.contracts.MEAN(om_table.n_events))>,
<Feature: MEAN(matcher.contracts.COUNT(om_table))>,
<Feature: MEAN(matcher.contracts.MEDIAN(om_table.n_events))>,
<Feature: MEAN(matcher.contracts.SUM(om_table.n_events))>,
<Feature: MEAN(matcher.contracts.MIN(om_table.n_events))>,
<Feature: MEDIAN(matcher.contracts.COUNT(events))>,
<Feature: MEDIAN(matcher.contracts.TIME_SINCE_LAST(events.event-timestamp))>,
<Feature: MEDIAN(matcher.contracts.NUM_UNIQUE(matcher.customer))>,
<Feature: MEDIAN(matcher.contracts.NUM_UNIQUE(om_table.om_family))>,
<Feature: MEDIAN(matcher.contracts.SKEW(om_table.n_events))>,
<Feature: MEDIAN(matcher.contracts.MEAN(om_table.n_events))>,
<Feature: MEDIAN(matcher.contracts.COUNT(om_table))>,
<Feature: MEDIAN(matcher.contracts.MEDIAN(om_table.n_events))>,
<Feature: MEDIAN(matcher.contracts.SUM(om_table.n_events))>,
<Feature: MEDIAN(matcher.contracts.MIN(om_table.n_events))>,
<Feature: SUM(matcher.contracts.COUNT(events))>,
<Feature: SUM(matcher.contracts.TIME_SINCE_LAST(events.event-timestamp))>,
<Feature: SUM(matcher.contracts.NUM_UNIQUE(matcher.customer))>,
<Feature: SUM(matcher.contracts.NUM_UNIQUE(om_table.om_family))>,
<Feature: SUM(matcher.contracts.SKEW(om_table.n_events))>,
<Feature: SUM(matcher.contracts.MEAN(om_table.n_events))>,
<Feature: SUM(matcher.contracts.COUNT(om_table))>,
<Feature: SUM(matcher.contracts.MEDIAN(om_table.n_events))>,
<Feature: SUM(matcher.contracts.SUM(om_table.n_events))>,
<Feature: SUM(matcher.contracts.MIN(om_table.n_events))>,
<Feature: MODE(matcher.contracts.MODE(matcher.customer))>,
<Feature: MODE(matcher.contracts.MODE(om_table.om_family))>,
<Feature: MIN(matcher.contracts.COUNT(events))>,
<Feature: MIN(matcher.contracts.TIME_SINCE_LAST(events.event-timestamp))>,
<Feature: MIN(matcher.contracts.NUM_UNIQUE(matcher.customer))>,
<Feature: MIN(matcher.contracts.NUM_UNIQUE(om_table.om_family))>,
<Feature: MIN(matcher.contracts.SKEW(om_table.n_events))>,
<Feature: MIN(matcher.contracts.MEAN(om_table.n_events))>,
<Feature: MIN(matcher.contracts.COUNT(om_table))>,
<Feature: MIN(matcher.contracts.MEDIAN(om_table.n_events))>,
<Feature: MIN(matcher.contracts.SUM(om_table.n_events))>,
<Feature: MIN(matcher.contracts.MIN(om_table.n_events))>,
<Feature: IS_NULL(SKEW(matcher.contracts.n_event))>,
<Feature: IS_NULL(MEAN(matcher.contracts.n_event))>,
<Feature: IS_NULL(MEDIAN(matcher.contracts.n_event))>,
<Feature: IS_NULL(SUM(matcher.contracts.n_event))>,
<Feature: IS_NULL(MIN(matcher.contracts.n_event))>]
I expect to get new features that yield from all the trans_primitives list above.