3

I have never use R ,but now I need import a sparse matrix to do association rule in R

My import data is a sparse matrix like this:

       i   j   x
1       2   3   1
2       3   5   1
3       3   1   1
4       2   5   1
.       .   .   .
.       .   .   .
200000000  .   .   .

the sparse matrix size is 200,000,000 X 3, the matrix is 200000 X 100000 (big data?)

I want use this data to do association rules in R,
Is use 'Package arules' itemMatrix-class & tidLists-class() ? or others?

And how to do?

I do like this but not work:

channel <- odbcConnect("test")
data<-sqlQuery(channel,"select i,j,x from table") # it's work 
(args <- data.frame(data))                    # it's work ,print sparse matrix
#    i j x
#1   2 3 1
#2   3 5 1
#3   3 1 1 
#3   2 5 1 
# ....
(Aa <- do.call(sparseMatrix, args))           # it's work ,print sparse Matrix of class "dgCMatrix"
# 200000 X 100000 sparse Matrix of class "dgCMatrix"
#      1 2 3 4 5....
# [1,] . . . . .
# [2,] . . | . |
# [3,] | . . . |
# ....
rules <- apriori(Aa)                          # it's not work 

Error in as(data, "transactions") : 
no method or default for coercing “dgCMatrix” to “transactions”

Can use sparse matrix in apriori function?
Maybe I use the wrong package?
Do I need sparse matrix-> matrix->association rule?
or sparse matrix->association rule?

2 Answers2

1

import i,j:

library(RODBC)
library(arulse)
channel <- odbcConnect("DB", uid="XXXX", pwd="XXXX")
data<-sqlQuery(channel,"select distinct i as TID,j as item from table")
trans <- as(split(data[,"item"], data[,"TID"]), "transactions") # add this
rules <- apriori(trans)
  • 1
    The "split" operation is unbelievably slow, and seems inefficient. Since a "transactions" object is internally a sparse matrix, it seems that there should be a straightforward way to convert a "Matrix" object to a "transactions" object. – Zach Nov 20 '13 at 19:24
0

Internally, arules used to use dgcMatrix, but switched to the more efficient ngcMatrix (binary). If we convert to that, we're cool.

library(tidyverse)
library(arules)

data = data.frame(ID = sample(LETTERS[1:3], 20, T), item = sample(letters[1:5], 20, T), stringsAsFactors = F)

data %>%
  unique %>%
  xtabs(~ item + ID, data = ., sparse = T) ->
  m

head(m)
#> 3 x 5 sparse Matrix of class "dgCMatrix"
#>   a b c d e
#> A . 1 1 1 1
#> B 1 . 1 1 1
#> C . 1 1 1 .

apriori(m)
#> Error in as(data, "transactions"): no method or default for coercing "dgCMatrix" to "transactions"

That's the error we expect - but if we convert to another sparse matrix (quite fast) -

m1 <- as(m, "ngCMatrix")

apriori(m1)
#> Apriori
#> 
#> Parameter specification:
#>  confidence minval smax arem  aval originalSupport maxtime support minlen
#>         0.8    0.1    1 none FALSE            TRUE       5     0.1      1
#>  maxlen target   ext
#>      10  rules FALSE
#> 
#> Algorithmic control:
#>  filter tree heap memopt load sort verbose
#>     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
#> 
#> Absolute minimum support count: 0 
#> 
#> set item appearances ...[0 item(s)] done [0.00s].
#> set transactions ...[3 item(s), 5 transaction(s)] done [0.00s].
#> sorting and recoding items ... [3 item(s)] done [0.00s].
#> creating transaction tree ... done [0.00s].
#> checking subsets of size 1 2 3 done [0.00s].
#> writing ... [4 rule(s)] done [0.00s].
#> creating S4 object  ... done [0.00s].
#> set of 4 rules

It all works.

Michael Griffiths
  • 1,399
  • 7
  • 14