This is what my dataframe looks like. The two rightmost columns are my desired columns. I am counting the cumulative number of unique FundTypes as of each row.The 4th columns is the cumulative unique count for all "ActivityType" and the 5th column is the cumulative unique count for only "ActivityType=="Sale".
dt <- read.table(text='
Name ActivityType FundType UniqueFunds(AllTypes) UniqueFunds(SaleOnly)
John Email a 1 0
John Sale a;b 2 2
John Webinar c;d 4 2
John Sale b 4 2
John Webinar e 5 2
John Conference b;d 5 2
John Sale b;e 5 3
Tom Email a 1 0
Tom Sale a;b 2 2
Tom Webinar c;d 4 2
Tom Sale b 4 2
Tom Webinar e 5 2
Tom Conference b;d 5 2
Tom Sale b;e;f 6 4
', header=T, row.names = NULL)
I have tried dt[, UniqueFunds := cumsum(!duplicated(FundType)& !FundType=="") ,by = Name]
but for example it counts a & a;b & c;d as 3 unique values as opposed to the desired 4 unique values as the factors are separated by semicolon.Kindly let me know of a solution.
UPDATE: My real dataset looks more like this:
dt <- read.table(text='
Name ActivityType FundType UniqueFunds(AllTypes) UniqueFunds(SaleOnly)
John Email "" 0 0
John Conference "" 0 0
John Email a 1 0
John Sale a;b 2 2
John Webinar c;d 4 2
John Sale b 4 2
John Webinar e 5 2
John Conference b;d 5 2
John Sale b;e 5 3
John Email "" 5 3
John Webinar "" 5 3
Tom Email a 1 0
Tom Sale a;b 2 2
Tom Webinar c;d 4 2
Tom Sale b 4 2
Tom Webinar e 5 2
Tom Conference b;d 5 2
Tom Sale b;e;f 6 4
', header=T, row.names = NULL)
The unique cumulative vectors need to take into account the missing values.