I'm looking to create a cumulative curve of species over time (not species accumulation in vegan) but to create a curve that will show the total number of unique species added over time. An example of my data frame looks like this:
Year Phylum SpeciesName
1861 Mollusca Littorina littorea
1862 Cnidaria Gersemia rubiformis
1862 Rhodophyta Ceramium virgatum
1863 Mollusca Littorina littorea
1863 Chlorophyta Ulva clathrata
etc etc etc
I would like to aggregate to a dataframe that looks like this
Year Cumulative
1861 1
1862 3
1863 4
Littorina littorea was already found in 1861 and therefor its entry in 1863 is not counted in the cumulative number. I cant figure out how to streamline this. Here is what I've tried
data %>% group_by(Year, Phylum) %>% summarise(Count=n_distinct(Species)) %>% ungroup() %>% mutate(Cum=cumsum(Count))
which would give me:
Year Phylum Count Cumulative
1861 Mollusca 1 1
1862 Cnidaria 1 2
1862 Rhodophyta 1 3
1863 Mollusca 1 4
1863 Chlorophyta 1 5
However, this just aggregates all the unique species per phylum and adds them, not accounting for the fact that a species may have already showed up in years before. I just cant seem to figure out which way I should actually aggregate the unique values over time. Thanks!