I have a dataset (df) similar to this one:
df <- data.frame("ID"=c(1, 1, 1, 2, 2),
"Method of payment"=c("cash","liabilities",
"shares", "cash", NA),
"USD"=c(110, 130, 200, 100, NA),
"var3"=c(1500, NA, NA, "ab", "bc"))
df
# ID Method.of.payment USD var3
# 1 1 cash 110 1500
# 2 1 liabilities 130 <NA>
# 3 1 shares 200 <NA>
# 4 2 cash 100 ab
# 5 2 <NA> NA bc
And I would like to have:
data.frame("ID"=c(1, 2),
"Method of payment"=c("cash, liabilities, shares", "cash"),
"Cash"=c(110, 100),
"Liabilities"=c(130, 0),
"Shares"=c(200, 0),
"var3"=c(1500, "ab,bc"))
# ID Method.of.payment Cash Liabilities Shares var3
# 1 1 cash, liabilities, shares 110 130 200 1500
# 2 2 cash 100 0 0 ab,bc
Therefore, I would like to
- collapse the variables "method of payment" and "var3" in order to have only one row per ID, having all the levels/values linked to an ID pasted in one row;
- generate new variables from existing levels of the factor variable method of payment ("cash", "liabilities", "shares") whose values should be the corresponding values of the variable USD or 0 if there is no corresponding value.
I am working with a very large dataset, so I am looking for something that works also on large data. I hope it is clear.