consider the following data.frame:
> head(dtrain)
content_id item_age item_ctr likes clicks no_clicks event
1 11201926 461540 0.02787456 1 24 837 0
2 11201926 462497 0.02784223 1 24 838 0
3 11201926 473215 0.02780997 1 24 839 0
4 11201926 532983 0.02777778 1 24 840 0
5 11201926 536696 0.02774566 1 24 841 0
6 11201926 545545 0.02771363 1 24 842 0
I want to split the data by content_id which only requires the following command
result <- split(dtrain , f = dtrain$content_id )
But then I want to preserve only the data from dtrain where content_id had at list 1000 appearances (in dtrain). In other words, where the same content_id was present in dtrain more then 1000 times.
In the end, I will have split data by content_id where each split will have at list 1000 occurrences (because that's the aggregated condition)