1

I'm trying to use FeatureTools to create a dataset for use in customer churn analysis. I have a raw dataset of orders that include fields like:

customer_id, order_id, order_month, order_datetime, order_cost

I'd like to create a dataset that returns one row per customer per month they've made an order and relevant information like AVG(order_cost) within that month. So far I've made entities including order (based on order_id) and customer (customer_id). I haven't been able to figure out how to create monthly features for each customer, however. I've tried creating a separate entity that is based on a custom ID of each customer_id + order_month. Is that the best approach? Is there a better tool for this?

Thanks!

kevin.w.johnson
  • 1,684
  • 3
  • 18
  • 37

1 Answers1

0

Thanks for the question. If you want one row per customer per month, then that is currently the best approach. Otherwise, you can also use where primitives to get one column per month for each customer. To do that, you would need to set interesting values for the order_month variable and use where_primitives in DFS.

es['orders']['order_month'].interesting_values = df.order_month.unique()

fm, fd = ft.dfs(
    target_entity='customers',
    entityset=es,
    trans_primitives=[],
    agg_primitives=['mean'],
    where_primitives=['mean'],
)
             MEAN(orders.order_cost WHERE order_month = 1)  ...  MEAN(orders.order_cost WHERE order_month = 12)
customer_id                                                 ...                                                
2                                               659.426667  ...                                           3.000
1                                               478.490000  ...                                         435.270
3                                               468.316667  ...                                         319.975
Jeff Hernandez
  • 2,063
  • 16
  • 20