3

I have the following dataframe - CTVU.

MMGID_5    EMAIL
2341       1@email.x
50         1@email.x
311        1@email.x
2341       2@email.x
2387       2@email.x
57         2@email.x
2329       2@email.x
2026       3@email.x
650        3@email.x
2369       3@email.x

I want to turn the rules created below, back into a dataframe with two new columns that contain the item with the highest confidence in the first column and the confidence in the second.

library(arules)
library(arulesViz)

CTVU <- read.csv("CTVU.csv", header = TRUE)
CTVU <- unique(CTVU[ , c(2,5) ])
CTVU <- as(split(CTVU[,"MMG5_ID"], CTVU[,"EMAIL"]), "transactions")
itemFrequencyPlot(CTVU,topN=20,type="absolute")
rules <- apriori(CTVU, parameter = list(supp = 0.001, conf = 0.1))
options(digits=2)
inspect(rules[1:5])
rules<-sort(rules, by="confidence", decreasing=TRUE)
rules <- apriori(CTVU, parameter = list(supp = 0.001, conf = 0.8,maxlen=3))

rules<-apriori(data=CTVU, parameter=list(supp=0.001,conf = 0.01,minlen=2),
appearance = list(default="rhs",lhs="289"),
control = list(verbose=F))
rules<-sort(rules, decreasing=TRUE,by="confidence")
inspect(rules[1:5])

So in the end I have a dataframe that looks like this:

EMAIL      MMG5_rule   Confidence
1@email.x  50          0.5
2@email.x  2341        0.2
3@email.x  2026        0.6

I did some research but wasn't able to find a solution. Can someone help me figure out how to do this?

Davis
  • 466
  • 4
  • 20
  • @rcs - thanks. This creates the rules as a `data.frame`. Do you have any suggestions how I can apply the rules to the data.frame to make a prediction what for example a customer is likely to buy next? – Davis Aug 27 '16 at 11:25
  • You might want to look at package recommenderlab. It has an association rules-based recommender (using package arules). – Michael Hahsler Aug 28 '16 at 02:25
  • @MichaelHahsler thanks I will take a look at recommenderlab. Might be an easier solution than what I'm trying to do at the moment. – Davis Aug 29 '16 at 17:48

1 Answers1

2

You don't need to turn your arules output into a data.frame. If you have a new customer with a list of bought items, you can find relevant association rules with arules::subset:

newCustomer <- c("toothbrush", "chocolate", "gummibears")
arules::subset(aprioriResults, subset = lhs %in% newCustomer)

More info on that in the subset help:

subset works on the rows/itemsets/rules of x. The expression given in subset will be evaluated using x, so the items (lhs/rhs/items) and the columns in the quality data.frame can be directly referred to by their names.

Important operators to select itemsets containing items specified by their labels are %in% (select itemsets matching any given item), %ain% (select only itemsets matching all given item) and %pin% (%in% with partial matching).

However, the question what a customer is likely to buy next is -- in my view -- more of a question to be answered using sequence mining. Luckily, arulesSequences is a package doing that, and it's by the same authors, so little extra work is required.

Community
  • 1
  • 1
sebastianmm
  • 1,148
  • 1
  • 8
  • 26