Background:
In using the automated feature engineering library featuretools, I first built 2 datasets member
and order
in the entityset. I build a relationship between them by implying ft.Relationship(es['member']['memberId'], es['order'][memberId'])
. Thus the entityset looks like that:
Entityset: featuretoolsTesting01
Entities:
member [Rows: 60115, Columns: 6]
order [Rows: 600, Columns: 7]
Relationships:
order.memberId -> member.memberId
We can see that 60k members have 600 order records (7 columns with nearly no null value) and I want to generate some features like MODE(order.amount), so I implement the dfs:
feature_matrix, feature_names = ft.dfs(entityset=es, target_entity='member',
n_jobs = 4, verbose = 1, features_only = False,
max_depth = 1)
After the dfs I did have found some columns like 'SUM(orders_es.spu_kind)', 'SUM(orders_es.spu_quantity)', 'STD(orders_es.spu_kind)', 'COUNT(orders_es)', 'NUM_UNIQUE(orders_es.source)', 'MODE(orders_es.source)' ......
Issue:
But I found surprisingly that all these columns are null (all NaN) and no single value >0.
All the len(feature_matrix[feature_matrix[i] > 0])
is 0.
What's the problem? I still recalled that a few months earlier when doing a similar featuretools to generate a bunch of features I can have a good feature_matrix table, then why this time the order-related columns are all null?