0

The transaction numbers related with the frequent itemsets created are not kept after using the apriori method in mlxtend. They are dropped.

How can i keep the transaction numbers?

df = pd.read_csv('association_rule_items_fullmech.csv')

basket = (df.groupby(['transaction_doc', 'text'])['transaction_doc'].sum().unstack().reset_index().fillna(0).set_index('transaction_doc'))


# one-hot encoding
def encode_units(x):
if x <= 0:
    return 0
elif x >= 1:
    return 1

basket_sets = basket.applymap(encode_units)

baskets_sets dataframe essentially looks like this (this is just an mini arbitrary example but the same structure):

transaction_doc - text "string1" "string2" "string3"
0 1 0 1
1 1 1 0
2 0 0 1

i then apply the apriori function

frequent_itemsets = apriori(basket_sets, min_support=0.001, use_colnames=True)

however after this apriori funcion, the transaction_doc, which is where the indicator of which document the text comes from, disappears from the the idx column. I get a reseted index column with the frequent itemsets. I want to be able to retain the transaction_doc column after the apriori function is applied.

3awny
  • 319
  • 1
  • 2
  • 10

0 Answers0