0

Background:

In using the automated feature engineering library featuretools, I first built 2 datasets member and order in the entityset. I build a relationship between them by implying ft.Relationship(es['member']['memberId'], es['order'][memberId']). Thus the entityset looks like that:

Entityset: featuretoolsTesting01
  Entities:
    member [Rows: 60115, Columns: 6]
    order [Rows: 600, Columns: 7]
  Relationships:
    order.memberId -> member.memberId

We can see that 60k members have 600 order records (7 columns with nearly no null value) and I want to generate some features like MODE(order.amount), so I implement the dfs:

feature_matrix, feature_names = ft.dfs(entityset=es, target_entity='member',
                                       n_jobs = 4, verbose = 1, features_only = False,
                                       max_depth = 1)

After the dfs I did have found some columns like 'SUM(orders_es.spu_kind)', 'SUM(orders_es.spu_quantity)', 'STD(orders_es.spu_kind)', 'COUNT(orders_es)', 'NUM_UNIQUE(orders_es.source)', 'MODE(orders_es.source)' ......

Issue:

But I found surprisingly that all these columns are null (all NaN) and no single value >0. All the len(feature_matrix[feature_matrix[i] > 0]) is 0.

What's the problem? I still recalled that a few months earlier when doing a similar featuretools to generate a bunch of features I can have a good feature_matrix table, then why this time the order-related columns are all null?

Eric Yu
  • 21
  • 2

1 Answers1

2

Opps I found out where the problem is: it turns out to be the mismatch between the index of order and the index of member, they did have been transformed to string format, but transforming int or float to string leads to no match.
After using member.memberId.astype('int').astype('str') and the coop

Eric Yu
  • 21
  • 2
  • 1
    Sorry for the 'stupid' question here, some might meet a similar situation and might also find the 'astype' method to be useful though... – Eric Yu Jul 31 '20 at 11:49