I'm trying to figure how to implement a weighted cum sum primitive for Featuretools. The weighting shall depend on time_since_last like
cum_sum (amount) = sum_{i} exp( -a_{i} ) * amount_{i}
where i
are rolling 6 Month periods....
above you find the original question. after a while of try and error I came up with this code for my purpose:
using the data and initial setup for entity and relation from here
def weight_time_until(array, time):
diff = pd.DatetimeIndex(array) - time
s = np.floor(diff.days/365/0.5)
aWidth = 9
a = math.log(0.1) / ( -(aWidth -1) )
w = np.exp(-a*s)
return w
WeightTimeUntil = make_trans_primitive(function=weight_time_until,
input_types=[Datetime],
return_type=Numeric,
uses_calc_time=True,
description="Calc weight using time until the cutoff time",
name="weight_time_until")
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients',
agg_primitives = ['sum'],
trans_primitives = [WeightTimeUntil, MultiplyNumeric])
when I does above I came close to the feature I want but at the end I did not get it right which I do not understand. So I got feature
SUM(loans.WEIGHT_TIME_UNTIL(loan_start))
but not
SUM(loans.loan_amount * loans.WEIGHT_TIME_UNTIL(loan_start))
What did I miss here???
I tried further....
My guess was a type miss match! but the "types" are the same. Anyway I tried the following:
1) es["loans"].convert_variable_type("loan_amount",ft.variable_types.Numeric) 2) loans["loan_amount_"] = loans["loan_amount"]*1.0
For (1) as well for (2) I get the more promising resulting feature:
loan_amount_ * WEIGHT_TIME_UNTIL(loan_start)
and also
loan_amount * WEIGHT_TIME_UNTIL(loan_start)
but only when I have the target value = loans instead of clients which actually was not my intention.