0

Currently I have dataframe like this:

enter image description here

I want to slice the dataframe by itemsets where it has only two item sets For example, I want the dataframe only with (whole mile, soda) or (soda, Curd) ...

I tried to iterate through the dataframe. But, it seems to be not appropriate way to handle the dataframe.

two_itemsets=[]

for i, j in zip(sorted_itemsets["support"], sorted_itemsets["itemsets"]):
    list=[]
    
    if(len(j) == 2):
        list.append(i)
        list.append(j)
        
        two_itemsets.append(list)
top_itemsets = two_itemsets[:20]
top_df = pd.DataFrame(top_itemsets)
top_df.columns=['support', 'itemsets']
top_df

enter image description here

rules_ap = mlx.frequent_patterns.association_rules(top_df, metric="confidence", min_threshold=0.5)
"frozenset({'whole milk'})You are likely getting this error because the DataFrame is missing  antecedent and/or consequent  information. You can try using the  `support_only=True` option"

Also, using the dataframe to get the apriori rule is not working correctly. When I am creating the dataframe is there anything that I am missing?

I tried support_only=True but it prints nothing.

Ynjxsjmh
  • 28,441
  • 6
  • 34
  • 52
Park Bo
  • 53
  • 7
  • Will `(whole mile, soda)` or ` (soda, Curd)` be in that order or can they alternate? – Marcelo Paco Apr 14 '23 at 01:52
  • @Marcelo Paco, they are just example of how datafraem should be formed. Order does not matter. The second table is what I wanted from first table. – Park Bo Apr 14 '23 at 01:55
  • Does this answer your question? [Pandas: Filtering multiple conditions](https://stackoverflow.com/questions/48978550/pandas-filtering-multiple-conditions) – Marcelo Paco Apr 14 '23 at 02:07
  • @Marcelo Paco, type of itemsets column is frozenset no just simple values. – Park Bo Apr 14 '23 at 03:37

1 Answers1

1

With len and boolean indexing :

out = df.loc[df["itemsets"].str.len() == 2]#.reset_index(drop=True)

​ Output :

print(out)

    support                   itemsets
5  0.010066     (sausage, frankfurter)
7  0.010066         (curd, rolls/buns)
8  0.010066  (napkins, tropical fruit)
9  0.010066  (hard cheese, whole milk)
Timeless
  • 22,580
  • 4
  • 12
  • 30