2

I am experimenting with the apriori algorithm in the arules package.

This is what I've done: I loaded a view from SQL Server into R. Since that data is not in transactions form (to use in apriori), I had to convert it:

data <- sapply(orders, as.factor)

Then I entered the apriori function:

apriori(data, parameter = list (support=0.005, confidence=0.5))

I get this error:

Error in t(as(from, "ngCMatrix")) : error in evaluating the argument 'x' in selecting a method for function 't': Error in asMethod(object) : cannot coerce 'NA's to "nsparseMatrix"

I checked with a query and I don't even have any attribute that is NULL/NA.

I don't understand what the error means. Does someone know what the problem is and how to solve this?

Kim
  • 393
  • 2
  • 8
  • 18
  • Perhaps some values are coerced during `apriori()` into some form like integer or double thus creating `NA`'s where you won't find them before? Did you try `sum(is.na(data))` and what was the output? – Gullydwarf Dec 19 '14 at 12:19
  • @Gullydwarf Thanks. It also gives 0 as output unfortunately. – Kim Dec 19 '14 at 12:43
  • That is good :) and `sum(!is.finite(data))`? – Gullydwarf Dec 19 '14 at 14:51
  • @Gullydwarf Oh haha, I'm a noob. Alright, the output is 88393256. By the way, data contains over 12 million records. What does this mean? That 88393256 of 12 million are not finite? – Kim Dec 19 '14 at 15:19
  • Sorry, I should have been more clear about that command. When I realized I was witholding information I could not change my comment. `is.finite()` returns `True` for numericals and `False` for characters, `NULL`, `NA`, `NaN`, 'Inf'. Probably you have read your data and it is still in character format. If your data is supposed to be numerical try again with `sum(!is.finite(as.numerical(data)))`. This will tell you how many fields contain non-numbers – Gullydwarf Dec 19 '14 at 15:31
  • @Gullydwarf Thanks for your explanation! Well, the data consists out of a lot of nvarchar and some int attributes. The dataset is like a "sales" table with product names etc (shouldve stated that in the Question). And `is.character` gives `TRUE` indeed. Is it for the apriori function not possible to proces a dataset in character format? – Kim Dec 19 '14 at 15:43
  • 1
    @Gullydwarf I figured out that the columns with numbers only are causing the error (there were no NA's at all). Even though when I use `typeof()` on those columns with numbers only it says `character` (cause I used `as.factor` to change it). When I omit those columns, I don't have to use `as.factor` first and the apriori function is working fine. On other datasets with numbers it does work already, but I didn't have to use `as.factor` on those datasets so I guess that has something to do with it. Do you have any clue how I can include the columns with numbers anyway? – Kim Dec 23 '14 at 10:08
  • So why are you using `as.factor()` in the first place? :) – Gullydwarf Dec 24 '14 at 10:46
  • @Gullydwarf Because if I wouldn't coerce them to factor, I would get this error when I use the apriori function: `Error in asMethod(object) : column(s) 1, 5 not logical or a factor. Use as.factor or categorize first` – Kim Dec 24 '14 at 11:21

2 Answers2

2

I have encountered the same kind of error recently. All I have learnt was that your data have to be coerced to transactions for mining the itemsets or rules. This piece of code should be helpful.

transaction_data<- as(data, "transactions")
rules <- apriori(transaction_data,parameter = list(minlen=2,supp=0.2,conf=0.5))
IRTFM
  • 258,963
  • 21
  • 364
  • 487
Eva
  • 36
  • 2
  • 2
    I got the same error as above doing what you suggested. But by what segmented suggested to put the values in a data.frame worked. So what worked for me is: data = data.frame(as(data, "transactions")); rules = apriori(data) – Codious-JR Nov 09 '15 at 20:02
2

The main problem arises when you try to use R grouping functions. As you can see here most grouping functions does not return you back a data.frame. In your case you have used sapply which returns a vector back. Make sure you take care of appropriate conversions:

data = data.frame(sapply(orders,as.factor))

And then follow association rule building:

apriori(data, parameter = list (support=0.005, confidence=0.5))

This works as expected (tested).

Community
  • 1
  • 1
Segmented
  • 2,024
  • 2
  • 23
  • 44