I have a Dataframe like this
data = {'Customer':['C1', 'C1', 'C1', 'C2', 'C2', 'C2', 'C3', 'C3', 'C3'],
'NumOfItems':[3, 2, 4, 5, 5, 6, 10, 6, 14],
'PurchaseTime':["2014-01-01", "2014-01-02", "2014-01-03","2014-01-01", "2014-01-02", "2014-01-03","2014-01-01", "2014-01-02", "2014-01-03"]
}
df = pd.DataFrame(data)
df
I want to create a Feature which is for example the max value for each customer up to this point:
'MaxPerID(NumOfItems)':[3, 3, 4, 5, 5, 6, 10, 10, 14] #the output i want
So I set up the EntitySet and normalize it …
es = ft.EntitySet(id="customer_data")
es = es.entity_from_dataframe(entity_id="customer",
dataframe=df,
index='index',
time_index="PurchaseTime",
make_index=True)
es = es.normalize_entity(base_entity_id="customer",
new_entity_id="sessions",
index="Customer")
But creating the feature matrix doesn't produce the results i want.
feature_matrix, features = ft.dfs(entityset=es,
target_entity="customer",
agg_primitives = ["max"],
max_depth = 3
)
feature_matrix.head
sessions.MAX(customer.NumOfItems)
index
0 4
3 6
6 14
1 4
4 6
7 14
2 4
5 6
8 14
The returned feature is the max value per day from all customers (sorted by time), however if I run the same code without the time_index = "PurchaseTime"
the result is the max value just for the specific customer
sessions.MAX(customer.NumOfItems) \
index
0 4
1 4
2 4
3 6
4 6
5 6
6 14
7 14
8 14
I want a combination of these two: the max value for the specific customer up to this point.
Is this possible? I tried to work with es['customer']['Customer'].interesting_values =['C1', 'C2', 'C3']
but it didn't get me anywhere. I also tried modifying the new normalized entity and writing my own primitive for this.
I'm new to featuretools so any help would be greatly appreciated.