1

My primary aim is a feature which considers more recent information of higher value.

So, the Idea is to calculate via a new primitive transformation "WeightTimeUntil" a weighing factor which afterwards could be used by the transformation primitive "MultiplyNumeric" to get weighted values.

I used the walkthrough walkthrough of Will Koehrsen as a starting point for data and the entity setup.

Thereby I ran into following problem:

  1. featuretools have not choosen the combination I intended to achieve (see below)
  2. it looks like featuretools did not choose the combination because of type miss match?!
  3. by changing the type of the value I wanted to be multiply by the weighting factor I managed to get the right combination but not for the right target
  4. for target equal client, featuretools have not choosen the combination I intended to get at all. Only when I use the target equal loans where the date and the value are columns of, featuretools used the right combination

here is the code for the "WeightTimeUntil" primitive

def weight_time_until(array, time):
    diff = pd.DatetimeIndex(array) - time
    s = np.floor(diff.days/365/0.5)
    aWidth = 9
    a = math.log(0.1) / ( -(aWidth -1) )

    w = np.exp(-a*s) 

    return w


    WeightTimeUntil = make_trans_primitive(function=weight_time_until,
                                 input_types=[Datetime],
                                 return_type=Numeric,
                                 uses_calc_time=True,
                                 description="Calculates weight time until the cutoff time",
                                 name="weight_time_until")

here is the DFS execution code:

features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['sum'],
                                 trans_primitives = [WeightTimeUntil, MultiplyNumeric]) 

and here the list of features:

 <Feature: income>,
 <Feature: credit_score>,
 <Feature: join_month>,
 <Feature: log_income>,
 <Feature: SUM(loans.loan_amount)>,
 <Feature: SUM(loans.rate)>,
 <Feature: SUM(payments.payment_amount)>,
 <Feature: WEIGHT_TIME_UNTIL(joined)>,
 <Feature: join_month * log_income>,
 <Feature: income * log_income>,
 <Feature: income * join_month>,
 <Feature: credit_score * join_month>,
 <Feature: credit_score * log_income>,
 <Feature: credit_score * income>,
 <Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_start))>,
 <Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_end))>,
 <Feature: SUM(loans.loan_amount * rate)>,
 <Feature: income * SUM(loans.loan_amount)>,
 <Feature: credit_score * SUM(loans.loan_amount)>,
 <Feature: log_income * SUM(payments.payment_amount)>,
 <Feature: log_income * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: income * SUM(payments.payment_amount)>,
 <Feature: join_month * SUM(loans.rate)>,
 <Feature: income * SUM(loans.rate)>,
 <Feature: join_month * SUM(loans.loan_amount)>,
 <Feature: SUM(loans.rate) * SUM(payments.payment_amount)>,
 <Feature: credit_score * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: SUM(loans.rate) * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: income * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: log_income * SUM(loans.loan_amount)>,
 <Feature: SUM(loans.loan_amount) * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: SUM(loans.loan_amount) * SUM(payments.payment_amount)>,
 <Feature: credit_score * SUM(loans.rate)>,
 <Feature: log_income * SUM(loans.rate)>,
 <Feature: credit_score * SUM(payments.payment_amount)>,
 <Feature: SUM(payments.payment_amount) * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: join_month * WEIGHT_TIME_UNTIL(joined)>,
 <Feature: SUM(loans.loan_amount) * SUM(loans.rate)>,
 <Feature: join_month * SUM(payments.payment_amount)>

I expected something like this:

SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))>
Serge Ballesta
  • 143,923
  • 11
  • 122
  • 252

1 Answers1

1

The issue here is that SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))> is a depth 3 feature since you are stacking Sum, MultiplyNumeric, and WeightTimeUntil. You can read more about depth in the documentation here.

You can fix this by increasing the allowed depth in your call to dfs like this

features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['sum'],
                                 max_depth=3,
                                 trans_primitives = [WeightTimeUntil, MultiplyNumeric]) 

The alternative way to do it, is to provide your feature as a seed feature, which don't get counted towards the max depth. You can do that like this

seed_features=[ft.Feature(es["loans"]["loan_start"], primitive=WeightTimeUntil)]

features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['sum'],
                                 seed_features=seed_features,
                                 trans_primitives = [MultiplyNumeric])

The second approach my be preferable since it will create the feature you want, but fewer features overall.

Max Kanter
  • 2,006
  • 6
  • 16