0

I'm having trouble manipulating some data in R. I have a data frame containing info. relating to customer transactions. I extract the minimum date as follows,

hold <- (lapply(with(train_train, split(date,id)),min)) # minimum date

Giving me the following list:

head(hold)

#$`15994113`
#[1] "2012-03-02"
#
#$`16203579`
#[1] "2012-03-02"
#
#$`17472223`
#[1] "2012-03-22"

What I then want to do is take the date returned for each id, and merge it back to a data frame containing other relevant variables for each id. I attempted to do it as follows;

hold <- as.data.frame(unlist(hold))
hold <- as.data.frame(cbind(row.names(hold),hold[,1]))
names(hold) <- c('id', 'mindate')
transactions.temp <- merge(x = transactions.pro, y = hold, by = 'id')

However, the bind destroys the date format and I can't work out how to get a data structure of 'id' 'mindate' that will enable me to merge this onto my main dataset which looks like this;

> head(transactions.pro)
           id totaltransactions totalspend        meanspend
1:  100007447              1096    6644.88 6.06284671532847
2:  100017875               348     992.29 2.85140804597701
3:  100051423               646    2771.43 4.29013931888545
4: 1000714152              2370   10509.08 4.43421097046414
5: 1002116097              1233    4158.51 3.37267639902676
6: 1004404618               754    2978.15 3.94980106100796

Any advice you provide will be hugely appreciated. Thanks!

talat
  • 68,970
  • 21
  • 126
  • 157
aspaceo
  • 45
  • 1
  • 3

2 Answers2

1

You could try a different approach with dplyr, where you wouldn't first convert to a list but keep the mindates as data.frame and then left_join (=merge with all.x=TRUE) it to the transactions.pro data.frame. Since there is no reproducible example, I didn't test it.

require(dplyr)

train_train %>%
  mutate(date = as.Date(as.character(date))) %>%
  group_by(id) %>%
  summarize(mindate = min(date)) %>%
  left_join(transactions.pro, ., by = "id")
talat
  • 68,970
  • 21
  • 126
  • 157
1

Your cbind is implicitly converting your dates to character because of row.names. Use the data.frame method for cbind to achieve this. Essentially replace:

as.data.frame(cbind(row.names(hold),hold[,1]))

with

cbind.data.frame(row.names(hold), hold[,1])
asb
  • 4,392
  • 1
  • 20
  • 30