I have a data frame (date, stock, price, size) for which I want to aggregate the number of instances, sum the size and get min and max of price. That part works, however, the final output contains multiple rows of the same ticker. For example, DGAZ appears twice in the cand data frame, and it appears twice in the results_all section. In results_all, I would like one row per ticker.
DGAZ in cand:
18 2017-01-18 DGAZ 3.74 836000
19 2017-01-18 DGAZ 3.76 500000
DGAZ in result_all:
date symbol print_count tot_shares min_price max_price
1 2017-01-18 DGAZ 2 1336000 3.74 3.76
2 2017-01-18 DGAZ 2 1336000 3.74 3.76
Do you know why that is?
Please note, I will ultimately do this for multiple dates, that is why there are 2 do.call statements.
Reproducible code:
cand <- structure(list(date = structure(c(17184, 17184, 17184, 17184,
17184, 17184, 17184, 17184, 17184, 17184, 17184, 17184, 17184,
17184, 17184, 17184, 17184, 17184, 17184), class = "Date"), stock = c("AAPL",
"ABB", "ABEV", "AMTD", "AXTA", "BNDX", "BWX", "BNDX", "BNPQY",
"BNPQY", "BPESF", "BTG", "BWX", "CLCD", "CMCSA", "CX", "DANDY",
"DGAZ", "DGAZ"), price = c(120, 22.41, 5.31, 46, 28.1, 54.06,
26.23, 54.08, 31.79, 31.79, 1.04, 2.86, 26.28, 46.3, 72.7, 8.28,
12.72, 3.74, 3.76), size = c(2350000L, 500000L, 500000L, 631400L,
553525L, 748655L, 1347888L, 711454L, 881744L, 881744L, 745808L,
700000L, 1296627L, 612347L, 840000L, 500000L, 650000L, 836000L,
500000L)), .Names = c("date", "stock", "price", "size"), row.names = c(NA,
19L), class = "data.frame")
result_all <- do.call(rbind, lapply(unique(cand$date), function(ind_date){
temp <- cand[cand$date == ind_date,]
result_day <- do.call(rbind, lapply(unique(temp$stock), function(ind_stock){
temp2 <- temp[temp$stock == ind_stock,]
print(data.frame(date=temp2$date, symbol=temp2$stock, print_count=nrow(temp2),
tot_shares = sum(temp2$size), min_price=min(temp2$price),
max_price=max(temp2$price)))
}))
}))
print(result_all)
Thank you.