I have a dataset on which I am trying to determine association rules. The data after the merging and mapping is as follows:
Following this reference: Market Basket Analysis in Python. I see that I can use the groupby
method to group data using the order ID and using this command:
basket = df_order_mapped.groupby(['order_id']).sum().unstack()
I am able to group every thing by the order_id with no spaces between the individual products bought. However, I am clueless from here on in as to how to perform one hot encoding as done in the reference. The reference uses the command:
basket = (df[df['Country'] =="France"]
.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))
Even though I have tried to understand each individual command one by one but I can't seem to get my head around things. Just as a test I tried to use groupby
with both the order_id and product_id but I get the error:
IndexError: index 838323453 is out of bounds for axis 0 with size 838322411
The number of rows is 3m and the total number of potential products is 25000.
I would be grateful if someone can help me with this.
Thanks in advance.