0

I am trying to build an association rules algorithm using Sparklyr and have been following this blog which is really well explained.

However, there is a section just after they fit the FPGrowth algorithm where the author extracts the rules from the "FPGrowthModel object" which is returned but I am not able to reproduce to extract my rules.

The section where I am struggling is this piece of code:

rules = FPGmodel %>% invoke("associationRules")

Could someone please explain where FPGmodel comes from?

My code looks as follows and I am not seeing an FPGmodel object that I can extract my rules from, any help would be greatly appreciated.

# CACHE HIVE TABLE INTO SPARK
tbl_cache(sc, 'claims', force = TRUE)
med_tbl <- tbl(sc, 'claims')

# SELECT VARIABLES OF INTEREST
med_tbl <- med_tbl %>% select(proc_desc,alt_claim_id)

# REMOVE DUPLICATED ROWS
med_tbl <- dplyr::distinct(med_tbl)

med_tbl <- med_tbl %>% group_by(alt_claim_id)

# AGGREGATING CLAIMS BY CLAIM ID
med_agg <- med_tbl %>% 
  group_by(alt_claim_id) %>% 
  summarise(procedures = collect_list(proc_desc))

# CREATE UNIQUE STRING TO IDENTIFY THE MACHINE LEARNING ESTIMATOR
uid = sparklyr:::random_string("fpgrowth_")

# INVOKE THE FPGrowth JAVA CLASS 
jobj = invoke_new(sc, "org.apache.spark.ml.fpm.FPGrowth", uid) 


jobj %>% 
  invoke("setItemsCol", "procedures") %>% 
  invoke("setMinConfidence", 0.03) %>% 
  invoke("setMinSupport", 0.01) %>% 
  invoke("fit", spark_dataframe(med_agg))
TheGoat
  • 2,587
  • 3
  • 25
  • 58

1 Answers1

0

The blog post you've linked has been obsolete for almost two years. Since 2b0994c provides native wrapper for o.a.s.ml.fpm.FPGrowth

df <- copy_to(sc, tibble(items=c("a b c", "a b", "c f g", "b c"))) %>%
  mutate(items = split(items, "\\\\s+")

fp_growth_model <- ml_fpgrowth(df)
antecedent consequent confidence  lift
  <list>     <list>          <dbl> <dbl>
1 <list [1]> <list [1]>          1  1.33
10465355
  • 4,481
  • 2
  • 20
  • 44