2

In the aRules package in R, how could I go about efficiently finding closed association rules? i.e. Rules with a closed LHS itemset

An itemset is closed iff adding any item reduces support.

The package provides the following mining options:

target: a character string indicating the type of association mined. One of

  • "frequent itemsets"
  • "maximally frequent itemsets"
  • "closed frequent itemsets"
  • "rules" (only available for Apriori)
  • "hyperedgesets" (only available for Apriori; see references for the definition of association • hyperedgesets)

There doesn't seem to be a "closed rules" option. There are two obvious work-arounds:

  1. Mine rules and apply filter for closed itemsets

    rules = apriori(data, parameter=list(target="rules")))
    rules <- rules[is.closed(generatingItemsets(rules))]
    

This can be quite slow. For eg on 5k transactions with 10k items, aPriori generated 8M rules in 10s. The closure filter took ~20 minutes resulting in ~3k closed rules.

  1. Mine closed frequent itemsets and apply filter for associations (confidence, lift etc)

Not yet implemented, but it seems like a round about way of achieving something much simpler.

If anyone is aware of other implementations (other R packages or even something outside R) which can do this, pointers would be very helpful. Eg. The SPMF library seems to have support for it, wondering if anyone has experience using it

user997943
  • 303
  • 1
  • 5
  • 12
  • The SPMF library offers a fast implementation of closed association rule mining in Java, as well as many other algorithms for association rule mining. You could check the wrapper to call the SPMF library from R ( https://github.com/pommedeterresautee/spmf ) . – Phil Aug 18 '16 at 02:12

2 Answers2

1

The function ruleInduction() can be used to create closed rules defined by Pei et al. (2000) as rules X -> Y where both X and Y are closed frequent itemsets. The following is taken from the manual page (slightly enhanced):

data("Adult")
## find all closed frequent itemsets
closed <- apriori(Adult, 
   parameter = list(target = "closed", support = 0.4))

## use rule induction to produce all closed association rules
closed_rules <- ruleInduction(closed, Adult)

## X&Y are already closed, check that X is also closed
closed_rules[is.element(lhs(closed_rules), items(closed))]

## inspect the resulting closed rules
summary(closed_rules)
inspect(head(closed_rules, by = "lift"))
Michael Hahsler
  • 2,965
  • 1
  • 12
  • 16
1

In the arules package support is the percent of transactions that contain all items in an itemset's combined LHS & RHS. In other words the union of the items on the LHS & RHS.

This means that using the parameter target = 'closed frequent itemsets' in your apriori() call will resolve your first question and only produce closed itemsets/rules based on closed itemsets.

Similarly, to answer part two of your question, parameters exist for filtering by confidence and lift, prior to generating rules. This way, you will not have to filter out rules after the rules are generated, and you will get the same results you sound like you are looking for.

Additionally, filtering for closed itemsets, confidence, lift, etc in your apriori() call will speed up the minning process.

Tony B
  • 376
  • 1
  • 3
  • 12