In the arules package you can read in transaction data such as the example groceries dataset:
groceries <- read.transactions("groceries.csv", sep = ",", rm.duplicates=T)
If you then inspect the transactions you get:
inspect(groceries[1:3])
items
1 {,
citrus fruit,
margarine,
ready soups,
semi-finished bread}
2 {,
coffee,
tropical fruit,
yogurt}
3 {,
whole milk}
As you can see it thinks the first item in each transaction is a blank. It should look like this:
items
1 {citrus fruit,
margarine,
ready soups,
semi-finished bread}
2 {coffee,
tropical fruit,
yogurt}
3 {whole milk}
I'm not sure if something has changed in the latest version of R as examples that use the exact code above don't suffer from this problem.
This is what the raw csv file looks like in an editor (first 2 rows):
citrus fruit,semi-finished bread,margarine,ready soups,,,,,,,,,,,,,,,,,,,,,,,,,,,,
tropical fruit,yogurt,coffee,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
The trailing commas are there for a reason...they show that this row (transaction) has less items than the transaction with the most items. But it is these commas that are causing the problem.
How can I read in this csv file without the arules packages thinking those blanks are items?