0

I'm working with a large binary data matrix, 4547 x 5415, for association rule mining. Per usual, each row is a transaction with every column being an item. Whenever I call on the arules package it yields some arcane error message referencing the trio library. Does anyone have experience with this type of error?

i[1:10,1:10]
     101402 101403 101404 101405 101406 101411 101412 101413 101414 101415
 [1,]      0      0      0      1      0      0      1      0      0      0
 [2,]      0      1      0      0      0      0      1      0      0      0
 [3,]      0      0      0      0      0      0      1      0      0      0
 [4,]      0      0      0      1      0      0      0      0      0      1
 [5,]      0      0      0      1      0      0      0      0      0      1
 [6,]      0      1      0      0      0      1      0      0      0      0
 [7,]      0      0      0      0      0      0      1      0      0      0
 [8,]      0      0      1      0      0      0      0      0      0      1
 [9,]      0      0      0      0      0      1      0      0      0      0
[10,]      0      0      0      0      1      0      1      0      0      0



rules <- apriori(i, parameter=list(support=0.001, confidence=0.5))

    parameter specification:
     confidence minval smax arem  aval originalSupport support minlen maxlen target
            0.5    0.1    1 none FALSE            TRUE   0.001      1     10  rules
       ext
     FALSE

    algorithmic control:
     filter tree heap memopt load sort verbose
        0.1 TRUE TRUE  FALSE TRUE    2    TRUE

    apriori - find association rules with the apriori algorithm
    version 4.21 (2004.05.09)        (c) 1996-2004   Christian Borgelt
    set item appearances ...[0 item(s)] done [0.00s].
    set transactions ...[5415 item(s), 4547 transaction(s)] done [0.47s].
    sorting and recoding items ... [4908 item(s)] done [0.18s].
    creating transaction tree ... done [0.01s].
    **checking subsets of size 1 2Error in apriori(i, parameter = list(support = 0.001, confidence = 0.5)) : 
      internal error in trio library**

Reproducible example:

y <- matrix(nrow=4547, ncol=5415)
y <- apply(y, c(1,2), function(x) sample(c(0,1),1))
rules <- apriori(y, parameter=list(support=0.001, confidence=0.5))
user1636475
  • 113
  • 2
  • 11

1 Answers1

4

The problem is that there is a bug in the error handling in the arules package. You run out of memory and when the apriori code tries to create the appropriate error message then it instead creates an invalid call to printf which is handled under Windows by the trio library. So in short you should get an out of memory error.

This problem will be resolved in arules version 1.1-4.

To avoid running out of memory you need to increase support and/or restrict the number of items in the itemsets (maxlen in the list for parameter)

-Michael

Michael Hahsler
  • 2,965
  • 1
  • 12
  • 16