-1

I have a dataset of the format:

txn_id  prod_name
  223      milk 
  223      eggs  
  235      eggs
  235      bread
  235      butter

I am trying to use this data to find correlation between various products (Market Basket Analysis). For using the Apriori algorithm in R, the data needs to be of the format

| prod_name | prod_name | prod_name |
  milk         eggs
  eggs         bread      butter

How to achieve this?

zx8754
  • 52,746
  • 12
  • 114
  • 209
user2280975
  • 21
  • 1
  • 3
  • You are looking for the "dcast" functionality. Look up this function, that should help. You can group the columns due to your own needs by using an individual formulae. But first you need to install package "reshape2". – Rockbar Jun 20 '16 at 07:03

2 Answers2

0

The arules package has this functionality. If you look in the documentation, under transactions-class, you find:

## example 4: creating transactions from a data.frame with transaction IDs and items
a_df3 <- data.frame(
    TID = c(1,1,2,2,2,3),
    item=c("a","b","a","b","c", "b")
)
trans4 <- as(split(a_df3[,"item"], a_df3[,"TID"]), "transactions")

split rearranges the data so that you have a list with all items with the same TID per row.

sebastianmm
  • 1,148
  • 1
  • 8
  • 26
0

You could use dplyr and tidyr.

library(dplyr)
library(tidyr)

adf <- read.table(header = TRUE, stringsAsFactors = FALSE, text = '
txn_id  prod_name
223      milk 
223      eggs  
235      eggs
235      bread
235      butter') %>% tbl_df

### For each transaction, a 'prod_name_key' is created for each 'prod_name'
adf %>%
  group_by(txn_id) %>%
  mutate(prod_name_key = paste0('prod_name_', 1:n())) %>%  # Creates key
  spread(prod_name_key, prod_name, fill = '')              # Reshapes data

## Source: local data frame [2 x 4]
## 
##   txn_id prod_name_1 prod_name_2 prod_name_3
##    (int)       (chr)       (chr)       (chr)
## 1    223        milk        eggs            
## 2    235        eggs       bread      butter

There may be a more concise way to do this but this appears to do what you are asking.

steveb
  • 5,382
  • 2
  • 27
  • 36