My primary aim is a feature which considers more recent information of higher value.
So, the Idea is to calculate via a new primitive transformation "WeightTimeUntil" a weighing factor which afterwards could be used by the transformation primitive "MultiplyNumeric" to get weighted values.
I used the walkthrough walkthrough of Will Koehrsen as a starting point for data and the entity setup.
Thereby I ran into following problem:
- featuretools have not choosen the combination I intended to achieve (see below)
- it looks like featuretools did not choose the combination because of type miss match?!
- by changing the type of the value I wanted to be multiply by the weighting factor I managed to get the right combination but not for the right target
- for target equal client, featuretools have not choosen the combination I intended to get at all. Only when I use the target equal loans where the date and the value are columns of, featuretools used the right combination
here is the code for the "WeightTimeUntil" primitive
def weight_time_until(array, time):
diff = pd.DatetimeIndex(array) - time
s = np.floor(diff.days/365/0.5)
aWidth = 9
a = math.log(0.1) / ( -(aWidth -1) )
w = np.exp(-a*s)
return w
WeightTimeUntil = make_trans_primitive(function=weight_time_until,
input_types=[Datetime],
return_type=Numeric,
uses_calc_time=True,
description="Calculates weight time until the cutoff time",
name="weight_time_until")
here is the DFS execution code:
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients',
agg_primitives = ['sum'],
trans_primitives = [WeightTimeUntil, MultiplyNumeric])
and here the list of features:
<Feature: income>,
<Feature: credit_score>,
<Feature: join_month>,
<Feature: log_income>,
<Feature: SUM(loans.loan_amount)>,
<Feature: SUM(loans.rate)>,
<Feature: SUM(payments.payment_amount)>,
<Feature: WEIGHT_TIME_UNTIL(joined)>,
<Feature: join_month * log_income>,
<Feature: income * log_income>,
<Feature: income * join_month>,
<Feature: credit_score * join_month>,
<Feature: credit_score * log_income>,
<Feature: credit_score * income>,
<Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_start))>,
<Feature: SUM(loans.WEIGHT_TIME_UNTIL(loan_end))>,
<Feature: SUM(loans.loan_amount * rate)>,
<Feature: income * SUM(loans.loan_amount)>,
<Feature: credit_score * SUM(loans.loan_amount)>,
<Feature: log_income * SUM(payments.payment_amount)>,
<Feature: log_income * WEIGHT_TIME_UNTIL(joined)>,
<Feature: income * SUM(payments.payment_amount)>,
<Feature: join_month * SUM(loans.rate)>,
<Feature: income * SUM(loans.rate)>,
<Feature: join_month * SUM(loans.loan_amount)>,
<Feature: SUM(loans.rate) * SUM(payments.payment_amount)>,
<Feature: credit_score * WEIGHT_TIME_UNTIL(joined)>,
<Feature: SUM(loans.rate) * WEIGHT_TIME_UNTIL(joined)>,
<Feature: income * WEIGHT_TIME_UNTIL(joined)>,
<Feature: log_income * SUM(loans.loan_amount)>,
<Feature: SUM(loans.loan_amount) * WEIGHT_TIME_UNTIL(joined)>,
<Feature: SUM(loans.loan_amount) * SUM(payments.payment_amount)>,
<Feature: credit_score * SUM(loans.rate)>,
<Feature: log_income * SUM(loans.rate)>,
<Feature: credit_score * SUM(payments.payment_amount)>,
<Feature: SUM(payments.payment_amount) * WEIGHT_TIME_UNTIL(joined)>,
<Feature: join_month * WEIGHT_TIME_UNTIL(joined)>,
<Feature: SUM(loans.loan_amount) * SUM(loans.rate)>,
<Feature: join_month * SUM(payments.payment_amount)>
I expected something like this:
SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))>