37

I am developing e-shop where I will sell food. I want to have a suggestion box where I would suggest what else my user could buy based on what he's already have in cart. If he has beer, I want him to suggest chips and other things by descending precentage of probability that he'll buy it too. But I want that my algorithm would learn to suggest groceries based on the all users' previous purchases. Where should I start? I have groceries table user_id, item_id, date and similar. How can I make a suggestion box without brute-forcing which is impossible.

Andreas
  • 2,821
  • 25
  • 30
good_evening
  • 21,085
  • 65
  • 193
  • 298
  • 4
    Keep a log. Create a knowledge base. Keep it updated. Job done. – hakre Aug 27 '12 at 22:10
  • 1
    The [netflix recommendation prize](http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html) might be a good thing to read up on – n00begon Aug 27 '12 at 22:14
  • 1
    You might find [_Programming Collective Intelligence_](http://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325/) worth a look. – matt Aug 29 '12 at 23:11
  • It's data mining. Try reading Frequent Pattern-trees and Apriori methods. Basically need a FP-tree to support the suggestion box. Big research field here. – Alvin K. Sep 05 '12 at 07:30

12 Answers12

44

The thing you're describing is a recommendation engine; more specifically collaborative filtering. It's the heart of Amazon's "people who bought x also bought y" feature, and Netflix's recommendation engine.

It's a non-trivial undertaking. As in, to get anything that's even remotely useful could easily take more than building the ecommerce site in the first place.

For instance:

  • you don't want to recommend items that are already in the basket.
  • you don't want to recommend cheaper versions of the things that are already in the basket.
  • you don't want to recommend items that are out of stock.
  • you don't want to recommend items that are statistically valid, but make no sense ("hey, you bought nappies, why not buy beer?" - there is a story that in supermarkets, there is a statistical correlation because dads go out at night to buy nappies and pick up a six pack at the same time).
  • you do want to recommend items that are in a promotion right now
  • you don't want to recommend items that are similar to items in a promotion right now

When I tried a similar project, it was very hard to explain to non-technical people that the computer simply didn't understand that recommending beer alongside nappies wasn't appropriate. Once we got the basic solution working, building the exclusion and edge case logic took at least as long.

Realistically, I think these are your options:

  • manually maintain the related products. Time consuming, but unlikely to lead to weirdness.
  • use an off-the-shelf solution - either SaaS or include a library like R which supports this.
  • recommend (semi)random products. Have a set of products you want to recommend, and pick one at random - for instance, products on promotion, products which are in the "best seller" list, products which cost less than x. Exclude categories that could be problematic.

All those options are achievable in reasonable time; the problem with building a proper solution from scratch is that everyone will measure it against Amazon, and they've got a bit of a head start on you...

Neville Kuyt
  • 29,247
  • 1
  • 37
  • 52
  • 4
    +1. Definitively worth reading ! Even if it gives more questions than answers to OP. :) – Orabîg Aug 30 '12 at 10:19
  • 4
    actually, the whole point of datamining is to find out that beer goes with diapers. that's something you wouldn't find on your own, but that makes sense and makes the sales of beer go up. – njzk2 Sep 03 '12 at 16:43
  • Of course - I'm not arguing about the value of data mining; it's just that recommending beer and nappies as something that should go together would upset the buyer of nappies. – Neville Kuyt Sep 04 '12 at 09:26
  • 8
    (and upset the buyer of beer) – 00jt Sep 04 '12 at 22:00
  • @njzk2 as Neville said, it is statistically valid. But from an end-user's point of view, wouldn't it just seem inappropriate? – Nathan MacInnes Sep 05 '12 at 09:37
  • 2
    no.suggesting beer to the right group while they by diappers actually is an accurate suggestion and does increase the beer sales – njzk2 Sep 05 '12 at 09:43
  • You would probably want to rule out some products for recommendations (like beer or condoms) – Nin Sep 05 '12 at 11:36
9

This is a common problem solved by Apriori Algorithm in Data Mining. You may need to create another table which maintains this statistics and then suggest based on the preferred combination

Balaji Paulrajan
  • 613
  • 5
  • 15
Cid
  • 1,453
  • 1
  • 18
  • 37
6

Humm... you are looking for a product recommendation engine then... Well, they come, basically, in three flavours:

  • Collaborative filtering
  • Content-based filtering
  • Hybrid recommender systems

   The first one gathers and stores data on your users' activities, preferences, behavior, etc... This data is then sent into an engine that separates it into user channels. Each channel has certain characteristic likes and dislikes. So, when you have a new visitor he or she will be classified and be assiged an specific user profile. Then items will be displayed based on this profile's likes/dislikes.

   Now, content-based filtering uses a different approach - a less social one - by taking into account ONLY your user's previous browsing history, his preferences and activities. Essentially, this will create recommendations based on what this user has previously liked/purchased.

   But why choose just one of them, right? Hybrid recommender systems uses a bit of both to provide a personalized yet social recommendation. These are usually more accurate when it comes to providing recommendations.

   I think that the collaborative filtering is a great option when you have a big influx of users - it's kinda hard to build good channels with only 42 users/month accessing your website. The second option, based on content, is better for a small site with plenty of products - however, IMHO, the third one is the one for you - build something that will get users going from the start and gather all that data they generate to, in the future, be able to offer a amazon-like recommendation experience!

   Building one of these is no easy task as I'm sure you already know... but I strongly recommend this book (using a personal-history filtering!) which has really came through for me in the past: http://www.amazon.com/Algorithms-Intelligent-Web-Haralambos-Marmanis/dp/1933988665

Good luck and good learning!

lleite
  • 166
  • 4
5

I think the best approach is to categorize your items and use that information to make the choice.

I did this on a grocery website and the results worked quite well. The idea is to cross group items into a number of categories.

For example, lets take a banana. It's a fruit, but it is also commonly used with cornflakes or cereal for breakfast. Cereals are also a breakfast food but certain ones might be considered health foods while others are sugary treats.

With this sort of approach, you can quickly start making a table like this:

Item         | Category
-------------+------------
Banana       | Breakfast
Banana       | Quick
Banana       | Fruit
Banana       | Healthy
Museli       | Breakfast
Museli       | Healthy
Sugar Puffs  | Breakfast
Sugar Puffs  | Treat
Kiwi Fruit   | Fruit
Kiwi Fruit   | Healtyh
Kiwi Fruit   | Dessert
Milk         | Breakfast

With a simple lookup like this, you can easily find good items to suggest based on these groupings.

Lets say someone's basket contains a Banana, Museli and Sugar Puffs.

That's three breakfast items, two healthy, one not so much.

Suggest Milk as it matches all three. No impulse buy? Try again, throw in a Kiwi Fruit. and so on and so on.

The idea here is to match items across many different categories (especially ones that may not be directly apparent) and use these counts to suggest the best items for your customer.

Fluffeh
  • 33,228
  • 16
  • 67
  • 80
5

Make a crossell based on the shopping purchasing habits of other customers that alse bought that item. Let's say you have this purchase history in your database (orders table):

  1. Beer, chips, soda
  2. Beer, soda
  3. Soda, cake
  4. Chips, beer
  5. Cake, chips, beer

Then, if your customer has Beer on his cart, based on your customer's shopping habbits you can easily make a query and see that beer-related items are:

  1. Chips (3 times)
  2. Soda (2 times)
  3. Cake (1 time)

Then you can suggest chips and soda probably... The bigger your purchasing history the more accurate suggestions the system will make.

Ricardo Gamba
  • 154
  • 1
  • 2
  • 3
    In such approach, the suggestion list will also contain popular products not related with purchased. I suggest using some sort of [TF/IDF](http://en.wikipedia.org/wiki/Tf*idf) ranking. – iimos Aug 29 '12 at 22:53
1

You will probably like the Non-negative Matrix Factorization Algorithm, it can do exactly what you are looking for (besides the stuff that Neville K mentioned). The database table with bought groceries will be the matrix to factorize. One factor will be a matrix that contains stuff that people bought together. This matrix will be much smaller than a matrix where you compare each grocery to all others. It would automatically find "groups" of groceries that go well together, like the Categories that Fluffeh suggestet, you would find those automatically. Steps to execute:

  1. Every Day or Week: run the factorization on the bought grocerys table to find new "trends". Save the factor matrix.
  2. If a new shopping cart arrives: call a solver with the cart as parameter, you will get a cart enriched with products that fit well. Suggest the stuff that is not in the cart yet.

Someone already mentioned the Book Programming Collective Intelligence. Thats a good start.

Puckl
  • 731
  • 5
  • 19
0

You could use an artificial neural network which learns to combine different products based on previous purchases.

Here are two ressources on the topic:

http://en.wikipedia.org/wiki/Artificial_neural_network

http://www.ai-junkie.com/ann/evolved/nnt1.html

Stefan Neubert
  • 1,053
  • 8
  • 22
0

There are two basic ways to do it:

  1. Manually associate items in the database with each other (time-consuming but flexible).
  2. Automatically determine which item(s) other people purchased based on their past purchases.

It looks like you're leaning towards the latter. I have written something like this for a site that sells various items and suggests related items based on other customers' past purchases. Here's the query I use:

SELECT items.*, COUNT( cartitems.itemid ) AS c FROM 
items
LEFT JOIN cartitems ON ( cartitems.itemid = items.id )
LEFT JOIN carts ON ( carts.id = cartitems.cartid )
WHERE (
    carts.id IN (
        /* Every cart with this item: */
        SELECT cartitems.cartid
        FROM cartitems
        WHERE ( cartitems.itemid = 123456 )
    )
        AND
    ( cartitems.itemid != 123456 ) /* Items other than this one */
        AND
    carts.checkedout = TRUE /* Carts that have checked out */
)
GROUP BY cartitems.itemid
ORDER BY c DESC
LIMIT 5

This example assumes the item they're looking at has an id of 123456. The "carts" table contains past purchases. The "cartitems" table contains individual items that were purchased in the past.

kmoser
  • 8,780
  • 3
  • 24
  • 40
0

Searching for a meaningful answer to your question, I came across this document:

Topic Tracking Model for Analyzing Consumer Purchase Behavior

I have read only part of the document, but it looks like it may be a theoretical answer for your question. I hope it helps.

LightBulb
  • 964
  • 1
  • 11
  • 27
0

1 - categorize each product as 3 layered categorization (Type/function/price) as example so when a specific product selected, u can ignore all other categories this will save too much time and effort, then u can select random products from the same (Type/function/price) to throw in ur suggestions box.

this is if u don't want do dive in the hassle of theoretical machine intelligence or complex algorithms to code.

have a nice day :)

Ahmed
  • 37
  • 10
0

I think that the best way to do that is with "tags pattern". For example:

products Table:
=================================
product_id
product_name

tags Table:
=================================
tag_id
tag_name

tags_products Table:
=================================
id_product
id_tag

products registry example:
=================================
1 | Beer
2 | Chips
3 | Cake

tags registry example:
=================================
1 | beer
2 | chips
3 | cake

tags_products registry example:
=================================
1 | 2
1 | 3
2 | 1
2 | 3
3 | 1

Then, you can relate all you want and do a query easy :)

Be happy.

Grettings.

Estefano Salazar
  • 419
  • 4
  • 15
0

Like everyone above says, the key to making this work is to implement the

'x users' also bought 'y item'

Basically what you need to do is experiment with more table rows and columns in the already existing database, or link new that would keep a statistical data about the products people view. One very important column you need is rating or like (not facebook like)

You would need new tables like:

  • Friends table (would link user_ID's together to form 'friends')
  • Like table (would link Friends table and Product table for 'matching' products)
  • Statistic table (would be linked to Like table and Product table)

You would also need to update existing tables with extra columns like:

  • Product_table with average_rating(0-5/0-10/0-100 or like 0/1) per product

If user x and user y are friends, they would have their ID's matched in Friends table. The Like table would take a product in which two users are friends and like a product z with either: (rating 0-5/0-10/0-100; like 0/1) you decide which method.

When a product is being liked/rated it would have its ID with a specific column name product rating being updated with +X or -X depending if its rating or likes. You would also need to decide on the average positive if the product is being rated or liked. An example would be 50% for rating and 100 likes for like.

With all this done, when a user x shops for products, you can match to see if:

  • user has friends
  • user's friends also shopped for product y
  • what is friend's rating/like for the product y
  • which other products user x's friends rated high or liked
  • which other products user x's friends rated poor or didn't like

You can do lot more than suggest products. With just a little effort, you can make hot deals to people and their friends. New product emerges on the market, and its like product z, only better. If X people and all their friends like product z, they would possibly like and buy the new product.

David D.
  • 367
  • 2
  • 13