Amazon Personalize builds machine learning based recommenders based on your data. Therefore it's designed to predict what a user would be interested in now based on their past behavior (interactions) and the attributes of the items/products they've interacted with. Some considerations:
Data driven approach
Although you can import purchase history into Personalize, it's not designed as a datastore that you can query. If you simply want a list of the most commonly purchased products for a user, a database is likely a better tool to use. For example, to get the top 10 most purchased products for user someuser
from a relational database where orders are stored in Order
and OrderDetail
tables, the SQL might look something like this:
SELECT od.ItemID, COUNT(*) AS PurchaseCount
FROM Order AS o, OrderDetail AS od
WHERE o.UserID = 'someuser'
AND o.OrderID = od.OrderID
GROUP BY ItemID
ORDER BY PurchaseCount DESC
LIMIT 10
Recommender driven approach
If you're use case is more like, "given a user's recent purchases, what would they likely be interested in purchasing again", Personalize can be used. Here are the general steps.
- Create a Personalize custom dataset group.
- Create an interactions dataset in the dataset group. Your interactions dataset schema should minimally have
USER_ID
, ITEM_ID
, TIMESTAMP
, and EVENT_TYPE
.
- Create a CSV containing purchase history for all users in the columns defined in the schema above. The product ID would be
ITEM_ID
, purchase date would be TIMESTAMP
, and something like "Purchase" would be the EVENT_TYPE
.
- Create a solution using the
aws-user-personalization
recipe.
- Create a solution version for the solution.
- Create a Personalize campaign for the solution version.
- Create a filter that only includes products that the user has recently purchased. For example,
INCLUDE ItemID WHERE Interactions.EVENT_TYPE IN ("Purchase")
.
- Call the
GetRecommendations
API for the campaign with the filter created above.
One important caveat is that when filtering on interactions history, Personalize currently only considers the most recent 200 historical interactions (in the bulk import) and most recent 100 for real-time events (sent in via the PutEvents
API) for each user.