1

I have read many of the threads and do not think my question has been asked before. I have a data.frame in R related to advertisements shown to customers as such:.. I have many customers, 8 different products.. so this is just a sample

mydf <- data.frame(Cust = c(1, 1), age = c(24, 24), 
    state = c("NJ", "NJ"), Product = c(1, 1), cost = c(400, 410), 
    Time = c(35, 25), Purchased = c("N", "Y"))
mydf
#   Cust age state Product cost Time Purchased
# 1    1  24    NJ       1  400   35         N
# 2    1  24    NJ       1  410   23         Y

And I want to transform it to look as such ...

Cust | age | state | Product | cost.1 | time.1 | purch.1 | cost.2 | time.2 | purch.2
   1 |  24 |    NJ |       1 |    400 |     35 |       N |    410 |     23 |       Y

How can I do this? There are a few static variables for each customer such as age, state and a few others... and then there are the details associated with each offer that was presented to a given customer, the product # in the offer, the cost, the time, and if they purchased it... I want to get all of this onto 1 line for each customer to perform analysis.

It is worth noting that the number of products maxes out at 7, but for some customers it ranges from 1 to 7.

I have no sample code to really show. I have tried using the aggregate function, but I do not want to aggregate, or do any SUMs. I just want to do some joins. Research suggests the cbind, and tapply functions may be useful.

Thank you for your help. I am very new to R.

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
blast00
  • 559
  • 2
  • 8
  • 18
  • perhaps `dcast` ex: http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/ – hrbrmstr Mar 29 '14 at 23:36
  • Looking now. Thank you for your quick response, and helping me format the question so the example dataframe is easy to read. – blast00 Mar 29 '14 at 23:41
  • Unfortunately, it appears as though the dcast function will not work here. It appears to be purposed for taking values and turning them into column headers. E.g., it would make a new column for each cost. – blast00 Mar 29 '14 at 23:47
  • Most people try to avoid the thing that you are trying to do; having a long data table makes things much easier for a lot of things in R. What functions or analysis do you foresee, that will make it easier to have everything on one line? – Andy Clifton Mar 29 '14 at 23:53
  • I'm trying to run a regression, to figure out what factors drive someone to purchase a given product. Is it the number of offers, is it certain conditions associated with the last offer they saw, does it have to do with the time inbetween the offer they bought & the previous offer? I want to compute a variety of statistics about the different offers they saw & I also want to identify relationships between the offers for each customer. In my mind, it is easiest to do this analysis by first condensing into one row. – blast00 Mar 29 '14 at 23:55

1 Answers1

0

You are essentially asking to do a "long" to "wide" reshape of your data.

It looks to me like you're using "Cust", "age", "state", and "Product" as your ID variables. You don't have a an actual "time" variable though ("time" as in the sequential count of records by the IDs mentioned above). However, such a variable is easy to create:

mydf$timevar <- with(mydf, 
                     ave(rep(1, nrow(mydf)), 
                         Cust, age, state, Product, FUN = seq_along))
mydf
#   Cust age state Product cost Time Purchased timevar
# 1    1  24    NJ       1  400   35         N       1
# 2    1  24    NJ       1  410   23         Y       2

From there, this is pretty straightforward with the reshape function in base R.

reshape(mydf, direction = "wide", 
        idvar=c("Cust", "age", "state", "Product"),
        timevar = "timevar")
#   Cust age state Product cost.1 Time.1 Purchased.1 cost.2 Time.2 Purchased.2
# 1    1  24    NJ       1    400     35           N    410     23           Y
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485