I had to do some cleaning to reproduce your data but here is the result after
the cleaning portion and I named the variable clean_data
> str(clean_data)
'data.frame': 39 obs. of 8 variables:
$ CUSIP : chr "037833100" "17275R102" "38259P508" "594918104" ...
$ BuyDate : chr "12/1/2015 " "1/28/2016 " "10/29/2015" "3/1/2016 " ...
$ SellDate : chr "3/1/2017 " "2/21/2017" "2/18/2017" "3/2/2017 " ...
$ BuyAmount : num 45 28 39 35 47 46 29 46 26 33 ...
$ SellAmount : num 27 25 36 40 37 40 34 39 36 34 ...
$ Profit : num -18 -3 -3 5 -10 -6 5 -7 10 1 ...
$ DaysHolding: num 456 390 478 366 313 436 409 426 431 383 ...
$ Over365Days: num 1 1 1 1 0 1 1 1 1 1 ...
I then replicated 10x the set to give a slightly larger data size so the output would
make more sense and named it new_data
new_data <- plyr::ldply(1:10, function(i){
clean_data
})
Since Excel formatting is not inline with the way we readily access data after
manipulating and aggregation, I stored the results in a slightly different fashion
than you would in excel(...trust me... i was an analyst for years...it took getting used
to but now I will never go back to pivot tables...)
So the steps are for each individual CUSIP we are going to 'chunk' the data set
into only those records that belong to that identity set. IE for N unique
CUSIP ids, we will create N[m,j] fragmented data subsets by using the split
method. We will also wrap this in a loop so that we only have to apply our
aggregation formulas once but will apply to each subset. And for each dataframe
in the N unique CUSIP ids realm, we will return a list with the
1) Original chunked data
2) The Profit
3) The Total
out_split <- lapply(split(new_data, new_data$CUSIP), function(i){
list(
data_subset = i, # This is the data unique to the CUSIP id
profit_calc = sum(i[['Profit']]), # This is the sum of profits
total = sum(i[['Over365Days']]) # This is the sum of 365 roll
)
})
Now we can call our data by accessing the CUSIP id in the list that was returned
and find whatever we want. For example:
> out_split$`594918106`
$data_subset
CUSIP BuyDate SellDate BuyAmount SellAmount Profit DaysHolding Over365Days
35 594918106 11/21/2015 3/6/2017 44 38 -6 471 1
74 594918106 11/21/2015 3/6/2017 44 38 -6 471 1
113 594918106 11/21/2015 3/6/2017 44 38 -6 471 1
152 594918106 11/21/2015 3/6/2017 44 38 -6 471 1
191 594918106 11/21/2015 3/6/2017 44 38 -6 471 1
230 594918106 11/21/2015 3/6/2017 44 38 -6 471 1
269 594918106 11/21/2015 3/6/2017 44 38 -6 471 1
308 594918106 11/21/2015 3/6/2017 44 38 -6 471 1
347 594918106 11/21/2015 3/6/2017 44 38 -6 471 1
386 594918106 11/21/2015 3/6/2017 44 38 -6 471 1
$profit_calc
[1] -60
$total
[1] 10
And additionally, we can find just the calculated totals from all with:
(Since the totals were stored as the 3rd item in each iteration)
> sapply(out_split, `[[`, 3)
037833100 17275R102 38259P508 38259P509 38259P510 38259P511 594918104 594918105 594918106 594918107 594918108 594918109 594918110 68389X105
50 80 40 10 0 0 40 0 10 0 10 10 10 20
And if we wanted to see the profits:
> sapply(out_split, `[[`, 2)
037833100 17275R102 38259P508 38259P509 38259P510 38259P511 594918104 594918105 594918106 594918107 594918108 594918109 594918110 68389X105
-410 -120 -230 -110 70 120 10 30 -60 -90 -90 -20 -20 -100
EDIT
I realized the example output was on a set that only had one record... so to show another and additionnaly how this can be accessed easily:
> out_split$`037833100`$data_subset$Profit
[1] -18 -6 5 -15 12 1 -20 -18 -6 5 -15 12 1 -20 -18 -6 5 -15 12 1 -20 -18 -6 5 -15 12 1 -20 -18 -6 5 -15 12 1 -20 -18 -6 5 -15 12
[41] 1 -20 -18 -6 5 -15 12 1 -20 -18 -6 5 -15 12 1 -20 -18 -6 5 -15 12 1 -20 -18 -6 5 -15 12 1 -20
Data set recreation steps for anyone wanting to reproduce:
clean_data <- stringi::stri_split_lines('
CUSIP BuyDate SellDate BuyAmount SellAmount Profit DaysHolding Over365Days
037833100 12/1/2015 3/1/2017 45 27 -18 456 1
17275R102 1/28/2016 2/21/2017 28 25 -3 390 1
38259P508 10/29/2015 2/18/2017 39 36 -3 478 1
594918104 3/1/2016 3/2/2017 35 40 5 366 1
68389X105 4/14/2016 2/21/2017 47 37 -10 313 0
037833100 12/11/2015 2/19/2017 46 40 -6 436 1
17275R102 1/12/2016 2/24/2017 29 34 5 409 1
38259P508 12/22/2015 2/20/2017 46 39 -7 426 1
594918104 12/19/2015 2/22/2017 26 36 10 431 1
68389X105 2/13/2016 3/2/2017 33 34 1 383 1
037833100 12/9/2015 2/18/2017 32 37 5 437 1
17275R102 2/13/2016 2/27/2017 48 25 -23 380 1
38259P508 11/30/2015 2/23/2017 45 34 -11 451 1
594918104 11/14/2015 2/27/2017 47 28 -19 471 1
68389X105 2/10/2016 2/17/2017 39 38 -1 373 1
037833100 4/7/2016 3/5/2017 44 29 -15 332 0
17275R102 3/3/2016 2/19/2017 26 36 10 353 0
037833100 11/25/2015 2/17/2017 28 40 12 450 1
037833100 1/10/2016 3/6/2017 35 36 1 421 1
037833100 3/4/2016 2/22/2017 45 25 -20 355 0
38259P508 2/10/2016 3/7/2017 42 40 -2 391 1
38259P509 12/5/2015 2/25/2017 31 39 8 448 1
38259P510 4/7/2016 2/27/2017 27 34 7 326 0
38259P511 3/26/2016 2/17/2017 27 39 12 328 0
17275R102 2/11/2016 2/27/2017 29 39 10 382 1
17275R102 11/24/2015 2/18/2017 45 35 -10 452 1
38259P509 3/29/2016 3/7/2017 46 27 -19 343 0
38259P509 4/5/2016 2/23/2017 38 38 0 324 0
17275R102 2/13/2016 2/26/2017 35 31 -4 379 1
594918104 3/10/2016 3/4/2017 29 28 -1 359 0
17275R102 10/30/2015 2/23/2017 40 30 -10 482 1
17275R102 12/15/2015 3/2/2017 25 38 13 443 1
594918104 2/2/2016 2/22/2017 26 32 6 386 1
594918105 3/8/2016 2/20/2017 26 29 3 349 0
594918106 11/21/2015 3/6/2017 44 38 -6 471 1
594918107 3/21/2016 2/20/2017 48 39 -9 336 0
594918108 12/21/2015 3/5/2017 37 28 -9 440 1
594918109 1/16/2016 3/5/2017 35 33 -2 414 1
594918110 2/8/2016 3/2/2017 41 39 -2 388 1
', omit_empty = TRUE)[[1]] %>%
stringi::stri_split_regex("\\s+", simplify = TRUE) %>% (function(x){
col_names <- x[1,]
a_data <- data.frame(x[2:nrow(x),], stringsAsFactors = FALSE)
colnames(a_data) <- col_names
as.data.frame(Map(function(i){
.call_col <- sprintf("as.%s",readr::guess_parser(i))
do.call(.call_col, list(i))
}, a_data))
})