Cumulative count of unique values over time

Question

I have a dataframe mydf like this:

| Country    | Year |
| ---------- | ---- |
| Bahamas    | 1982 |
| Chile      | 1817 |
| Cuba       | 1960 |
| Finland    | 1918 |
| Kazakhstan | 1993 |

etc., with many more rows.

Is there an easy way to plot the cumulative number of unique countries over time? In other words,

x-axis = Year (a timeline), and
y-axis = cumulative number of countries that have already been mentioned

I tried stat_ecdf(), but the y-axis does not show the absolute count of countries:

ggplot(mydata, aes(x = Year)) + stat_ecdf()

This is an example of a mydf:

> dput(mydf)

structure(list(Country = c("Moldova", "Aragon", "Abu Dhabi", 
"Uzbekistan", "Sweden", "Anhalt", "Saudi Arabia", "Montenegro", 
"Central African Republic", "Bulgaria", "Argentina", "Senegal", 
"Sri Lanka", "Cambodia", "Benin", "Colombia", "Algeria", "Iraq", 
"DPRK", "Italy"), Year = c(1992L, 1223L, 1966L, 1993L, 1748L, 
1835L, 1955L, 1841L, 1959L, 1993L, 1806L, 1960L, 1955L, 1995L, 
1892L, 1914L, 1981L, 1958L, 1948L, 1900L)), row.names = c(NA, 
-20L), class = c("data.table", "data.frame"))

Since the countries do not repeat, the cum number of countries for each year is is simply the row number of that year. — SteveM, Apr 09 '21 at 15:13

score 1 · Accepted Answer · answered Apr 09 '21 at 15:28

Give the countries an ID number based on first appearance, and then the cumulative count is the same as the cumulative max of that ID:

mydf = mydf[order(mydf$Year, mydf$Country), ]
mydf$country_id = as.integer(factor(mydf$Country, levels = unique(mydf$Country)))
mydf$cum_n_country = cummax(mydf$country_id)

If years are repeated, you'll need to aggregate/summarize the max cum_n_country by year.

library(dplyr)
library(ggplot2)
mydf %>%
  group_by(Year) %>%
  summarize(cum_n_country = max(cum_n_country)) %>%
  ggplot(aes(x = Year, y = cum_n_country)) + 
  geom_line()

Perfect, thank you (also for proiding the alternative with repeated years)! — anpami, Apr 10 '21 at 08:37

Cumulative count of unique values over time

1 Answers1