Sklearn clustering similar orders

Question

I have a dataset of restaurant invoices, containing the products ordered bye each client.

I've already processed the data and I have the following matrix in csv file:

InvoiceID, Product 1, product 2, product 3, product 4, product 5.....
123,       0,         1,         0,         1,         0,       .....
124,       0,         1,         1,         1,         0,       .....
...

For each invoice I have an entry in the csv that contains 0 and 1 if the product in represented by column was ordered by the client (0 was not ordered, 1 was ordered).

How do I process this data with sklearn so I can cluster the invoices and get the centroids so I can see what products are in each cluster center?

Thank you!

EDIT: I have 957 products and a lot of them never were never ordered so I can reduce the matrix (dont know the best way to do it)

score 1 · Answer 1 · answered Mar 26 '15 at 01:49

1

Are you sure clustering is what you need?

It sounds as if market basket analysis (and frequent itemset mining) are the way to go.

Most clustering algorithms will assign every customer to exactly one type, whereas FIM will also detect subsets and overlapping patterns.

answered Mar 26 '15 at 01:49

Has QUIT--Anony-Mousse

76,138
12
138
194

I agree, frequent itemset mining is probably more the way to go. Do you know a good library for that? – Andreas Mueller Mar 26 '15 at 16:55

score 0 · Answer 2 · answered Mar 26 '15 at 00:36

0

You can use any of the clustering algorithms in scikit-learn. Take care not to pass it the ID column. You can mask the always zero columns using numpy or pandas. A good introduction to the clustering methods in scikit-learn can be found in the user guide

answered Mar 26 '15 at 00:36

Andreas Mueller

27,470
8
62
74

Sklearn clustering similar orders

2 Answers2