I asked a similar, yet simpler, question previously but realized later that this does not solve my problem. I feel that the required edits to the question are too severe for a simple edit of the question, especially with two valid answers already, so I will let it stay up and instead ask a new one:
I have the following DataFrame
of interactions (products viewed and/or bought) by different customers, at different times:
import pandas as pd
rng = list(pd.date_range('2019-02-24', periods=5, freq='T')) + list(pd.date_range('2019-03-13', periods=2, freq='T')) + list(pd.date_range('2019-02-27', periods=1, freq='T'))
customers = ["c12987"]*5 + ["c89563"]*2 + ["c56733"]
articles = ["a8473", "a7631", "a1264", "a8473", "a5641", "a9813", "a7631", "a1132"]
action_type = ["viewed", "purchased", "viewed", "purchased", "viewed", "viewed", "purchased", "viewed"]
interaction_history = pd.DataFrame({'Customer_no': customers, 'Date': rng, 'Article_no': articles, "Interaction": action_type})
interaction_history
Output:
Customer_no Date Article_no Interaction
0 c12987 2019-02-24 00:00:00 a8473 viewed
1 c12987 2019-02-24 00:01:00 a7631 purchased
2 c12987 2019-02-24 00:02:00 a1264 viewed
3 c12987 2019-02-24 00:03:00 a8473 purchased
4 c12987 2019-02-24 00:04:00 a5641 viewed
5 c89563 2019-03-13 00:00:00 a9813 viewed
6 c89563 2019-03-13 00:01:00 a7631 purchased
7 c56733 2019-02-27 00:00:00 a1132 viewed
I would like to, for each customer and row, get the previous articles viewed as well as the previous articles bought.
Expected output:
Customer_no Date Article_no Interaction Prev_viewed Prev_purchased
0 c12987 2019-02-24 00:00:00 a8473 viewed [] []
1 c12987 2019-02-24 00:01:00 a7631 purchased [a8473] []
2 c12987 2019-02-24 00:02:00 a1264 viewed [a8473] [a7631]
3 c12987 2019-02-24 00:03:00 a8473 purchased [a8473, a1264] [a7631]
4 c12987 2019-02-24 00:04:00 a5641 viewed [a8473, a1264] [a7631, a8473]
5 c89563 2019-03-13 00:00:00 a9813 viewed [] []
6 c89563 2019-03-13 00:01:00 a7631 purchased [a9813] []
7 c56733 2019-02-27 00:00:00 a1132 viewed [] []
I realize that I could iterate over each row with a custom function like interaction_history.apply(lambda x: my_custom_function(x), axis=1)
where my_custom_function(x)
would, for each row, filter through the entire interaction_history
to find the matching Customer_no, Interaction and appropriate dates. I also realize this solution would be highly inefficient and very complex, thus hoping that someone has any other ideas!