0

I have a data frame like this:

ID V1 V2
A  2  June
B  3  May
A  2  January
F  4  December

I want to add V3 that gives me the number of entries by earliest V2 within each ID:

ID V1 V2        V3
A  2  June      January
B  3  May       May
A  2  January   January
F  4  December  December

How do I do that?

questionmark
  • 335
  • 1
  • 13

2 Answers2

1

If you want to get the earliest month V2 for each ID, probably you can group it and then ungroup again (see more comments in the code below):

# load packages
library(tidyverse)
library(lubridate)

# data
data <- read.table(header = TRUE, text = "
    ID V1 V2
    A  2  June
    B  3  May
    A  2  January
    F  4  December
")

# 1. group by ID
# 2. get the earliest month with parsing by 'lubridate' package
# 3. ungroup
# 4. make months to words with 'lubridate' again
data %>%
    group_by(ID) %>%
    mutate(V3 = min(month(parse_date_time(V2, "%m")))) %>%
    ungroup() %>%
    mutate(V3 = month(V3, label = TRUE, abbr = FALSE))
0

Not strictly dplyr but I consider this to be rather easy to read (at least not many nested parens). Also: My minmonth function is handy for reuse some other time and it is easily translated to input in a non-English language:

dat <- read.table(text = "ID V1 V2
                           A  2  June
                           B  3  May
                           A  2  January
                           F  4  December", header = TRUE)

minmonth <- function(m){
  months <- c(January = 1, February = 2, March = 3,  # easily translated to 
             April = 4, May = 5, June = 6, July = 7, # other languages
             August = 8, September = 9, October = 10,
             November = 11, December = 12)
  m <- months[m]                                     # no static typing in R
  smallest <- min(m)
  return(names(months)[smallest])
}

dat$V3 <- ave(dat$V2, dat$ID, FUN = minmonth)
Bernhard
  • 4,272
  • 1
  • 13
  • 23