0

In the arules package you can read in transaction data such as the example groceries dataset:

groceries <- read.transactions("groceries.csv", sep = ",", rm.duplicates=T)

If you then inspect the transactions you get:

inspect(groceries[1:3])

items                
1 {,                   
   citrus fruit,       
   margarine,          
   ready soups,        
   semi-finished bread}
2 {,                   
   coffee,             
   tropical fruit,     
   yogurt}             
3 {,                   
   whole milk} 

As you can see it thinks the first item in each transaction is a blank. It should look like this:

 items                
1 {citrus fruit,       
   margarine,          
   ready soups,        
   semi-finished bread}
2 {coffee,             
   tropical fruit,     
   yogurt}             
3 {whole milk} 

I'm not sure if something has changed in the latest version of R as examples that use the exact code above don't suffer from this problem.

This is what the raw csv file looks like in an editor (first 2 rows):

citrus fruit,semi-finished bread,margarine,ready soups,,,,,,,,,,,,,,,,,,,,,,,,,,,,
tropical fruit,yogurt,coffee,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

The trailing commas are there for a reason...they show that this row (transaction) has less items than the transaction with the most items. But it is these commas that are causing the problem.

How can I read in this csv file without the arules packages thinking those blanks are items?

Cybernetic
  • 12,628
  • 16
  • 93
  • 132
  • Not sure if there is a native way to do this in R, but I have manually munged the data in Notepad++ to find & replace `,,` with `,`. Once this has been done enough, each line will end with a comma. Your final find should be `,\r\n`, and replace with `\r\n`. Should work for the dataset you have above, but not an answer as it doesn't scale to much larger data sets well. – Chris Apr 20 '15 at 23:21
  • The code in the upcoming release (1.1-10) will resolve the issues in `read.transaction()` with leading and trailing white spaces and trailing commas in the csv file. – Michael Hahsler Aug 21 '15 at 17:18

0 Answers0