I have a dataset of restaurant invoices, containing the products ordered bye each client.
I've already processed the data and I have the following matrix in csv file:
InvoiceID, Product 1, product 2, product 3, product 4, product 5.....
123, 0, 1, 0, 1, 0, .....
124, 0, 1, 1, 1, 0, .....
...
For each invoice I have an entry in the csv that contains 0 and 1 if the product in represented by column was ordered by the client (0 was not ordered, 1 was ordered).
How do I process this data with sklearn so I can cluster the invoices and get the centroids so I can see what products are in each cluster center?
Thank you!
EDIT: I have 957 products and a lot of them never were never ordered so I can reduce the matrix (dont know the best way to do it)