I checked similar entries in SO, none answers my question exactly.
My problem is this: Let's say, User1 has 6 purchases, User2 has 2. Purchase data is something like this:
set.seed(1234)
purchase <- data.frame(id = c(rep("User1", 6), rep("User2", 2)),
purchaseid = sample(seq(1, 100, 1), 8),
purchaseDate = seq(Sys.Date(), Sys.Date() + 7, 1),
price = sample(seq(30, 200, 10), 8))
#
users <- data.frame(id = c("User1","User2"),
uname = c("name1", "name2"),
uaddress = c("add1", "add2"))
> purchase
id purchaseid purchaseDate price
1 User1 12 2019-09-27 140
2 User1 62 2019-09-28 110
3 User1 60 2019-09-29 200
4 User1 61 2019-09-30 190
5 User1 83 2019-10-01 60
6 User1 97 2019-10-02 150
7 User2 1 2019-10-03 160
8 User2 22 2019-10-04 120
End data required includes 1 row for each user, that keeps the user name, address, etc. Then comes next columns for 20 purchases. The purchase data needs to be placed one after another in the same row. This is the rule: only one row for each user. If the user does not have 20 purchases, the remaining fields should be empty.
End data should therefore look like this:
id uname uaddr p1id p1date p1price p2id p2date p2price p3id p3date p3price p4id
1 User1 name1 add1 12 2019-09-27 140 62 2019-09-28 110 60 2019-09-29 200 61
2 User2 name2 add2 1 2019-10-03 160 22 2019-10-04 120 NA <NA> NA NA
p4date p4price
1 2019-09-30 190
2 <NA> NA
enddata <- data.frame(id = c("User1", "User2"),
uname = c("name1", "name2"),
uaddr = c("add1", "add2"),
p1id = c(12,1),
p1date = c("2019-09-27","2019-10-03"),
p1price = c(140, 160),
p2id = c(62, 22),
p2date = c("2019-09-28", "2019-10-04"),
p2price = c(110, 120),
p3id = c(60, NA),
p3date = c("2019-09-29", NA),
p3price = c(200, NA),
p4id = c(61, NA),
p4date = c("2019-09-30", NA),
p4price = c(190, NA))
I used reshape to get the data for each user into the wide format. The idea was doing it in a loop for each user id. Then I used rbindlist
with the fill option TRUE, but this time I am having problem with column names. After reshape, each gets different column names. Without fixed number of columns, you cannot set names either.
Any elegant solution to this?