I have a dictionary comprised of product names and unique customer emails who have purchased those items that looks like this:
customer_emails = {
'Backpack':['customer1@gmail.com','customer2@gmail.com','customer3@yahoo.com','customer4@msn.com'],
'Baseball Bat':['customer1@gmail.com','customer3@yahoo.com','customer5@gmail.com'],
'Gloves':['customer2@gmail.com','customer3@yahoo.com','customer4@msn.com']}
I am trying to iterate over the values of each key and determine how many emails match in the other keys. I converted this dictionary to a DataFrame and got the answer I wanted for a single column comparison using something like this
customers[customers['Baseball Bat'].notna() == True]['Baseball Bat'].isin(customers['Gloves']).sum()
What I'm trying to accomplish is to create a DataFrame that essentially looks like this so that I can easily use it for correlation charts.
Backpack Baseball Bat Gloves
Backpack 4 2 3
Baseball Bat 2 3 1
Gloves 3 1 3
I'm thinking the way to do it is to iterate over the customer_emails
dictionary but I'm not sure how you would pick out a single key to compare its values to all others and so on, then store it.