2

I have a table with customers and transactions. Is there a way how to get features that would be filtered for last 3/6/9/12 months? I would like to automatically generate features:

  • number of trans in last 3 months
  • ....
  • number of trans in last 12 months
  • average trans in last 3 months
  • ...
  • average trans in last 12 months

I've tried using the training_window =["1 month", "3 months"],, but it does not seem to return multiple features for each window.

Example:

import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)

window_features = ft.dfs(entityset=es,
   target_entity="customers",
   training_window=["1 hour", "1 day"],
   features_only = True)

window_features

Do I have to do individual windows separately and then merge the results?

Max Kanter
  • 2,006
  • 6
  • 16
Tomas Greif
  • 21,685
  • 23
  • 106
  • 155

1 Answers1

3

As you mentioned, in Featuretools 0.2.1 you have to build the feature matrices individually for each training window and then merge the results. With your example, you would do that as follows:

import pandas as pd
import featuretools as ft
es = ft.demo.load_mock_customer(return_entityset=True)
cutoff_times = pd.DataFrame({"customer_id": [1, 2, 3, 4, 5],
                             "time": pd.date_range('2014-01-01 01:41:50', periods=5, freq='25min')})
features = ft.dfs(entityset=es,
                  target_entity="customers",
                  agg_primitives=['count'],
                  trans_primitives=[],
                  features_only = True)
fm_1 = ft.calculate_feature_matrix(features, 
                                   entityset=es, 
                                   cutoff_time=cutoff_times,
                                   training_window='1h', 
                                   verbose=True)

fm_2 = ft.calculate_feature_matrix(features, 
                                   entityset=es, 
                                   cutoff_time=cutoff_times,
                                   training_window='1d', 
                                   verbose=True)
new_df = fm_1.reset_index()
new_df = new_df.merge(fm_2.reset_index(), on="customer_id", suffixes=("_1h", "_1d"))

Then, the new dataframe will look like:

customer_id COUNT(sessions)_1h  COUNT(transactions)_1h  COUNT(sessions)_1d COUNT(transactions)_1d
1           1                   17                      3                 43
2           3                   36                      3                 36
3           0                   0                       1                 25
4           0                   0                       0                 0
5           1                   15                      2                 29
Max Kanter
  • 2,006
  • 6
  • 16