I wish to get the sum of the occurrences of the fruit produced by each company in the previous five years and to construct a new column with all the sums.
For example: in 2016 company_b produced apples; oranges; pears while in the previous 5 years company_b produced (2011: apples; oranges; bananas) and (2014: oranges; pears). By counting the number of fruit produced in the previous five years which correspond to the focal (2016) year we get 4.
In my search for an answer I have only seen the sum of occurrences for numbers like in this post R: calculate the number of occurrences of a specific event in a specified time future . However, I need to count the appearances of all the words for any given company in the previous five years.
Any help would be much appreciated, also any solutions using dplyr is welcome! :)
df <- data.frame(company=c("company_a","company_b","company_b", "company_a","company_b","company_a"),
fruit=c("peaches, apples; oranges","apples; oranges; bananas","oranges; pears","bananas; apples; oranges; pears","apples; oranges; pears","bananas; apples; oranges; pears; peaches"),
year=c("2010","2011","2014","2014", "2016","2018"))
> df
company fruit year
1 company_a peaches, apples; oranges 2010
2 company_b apples; oranges; bananas 2011
3 company_b oranges; pears 2014
4 company_a bananas; apples; oranges; pears 2014
5 company_b apples; oranges; pears 2016
6 company_a bananas; apples; oranges; pears; peaches 2018
The resulting column should look like this:
df <- cbind(df, c("0","0","1","2","4","4")
company fruit year sum_occurrences
1 company_a peaches, apples; oranges 2010 0
2 company_b apples; oranges; bananas 2011 0
3 company_b oranges; pears 2014 1
4 company_a bananas; apples; oranges; pears 2014 2
5 company_b apples; oranges; pears 2016 4
6 company_a bananas; apples; oranges; pears; peaches 2018 4