R/dplyr: Summarize data without grouping it

Question

I have a data frame like this:

ID V1 V2
A  2  June
B  3  May
A  2  January
F  4  December

I want to add V3 that gives me the number of entries by earliest V2 within each ID:

ID V1 V2        V3
A  2  June      January
B  3  May       May
A  2  January   January
F  4  December  December

How do I do that?

@s_t, `V3` and `V2` are not the same in the first row. – 9314197 Jun 09 '20 at 15:28 — 9314197, Jun 09 '20 at 15:28
@questionmark, what does `V1` do here? – 9314197 Jun 09 '20 at 15:28 — 9314197, Jun 09 '20 at 15:28

score 1 · Answer 1 · answered Jun 09 '20 at 15:31

If you want to get the earliest month V2 for each ID, probably you can group it and then ungroup again (see more comments in the code below):

# load packages
library(tidyverse)
library(lubridate)

# data
data <- read.table(header = TRUE, text = "
    ID V1 V2
    A  2  June
    B  3  May
    A  2  January
    F  4  December
")

# 1. group by ID
# 2. get the earliest month with parsing by 'lubridate' package
# 3. ungroup
# 4. make months to words with 'lubridate' again
data %>%
    group_by(ID) %>%
    mutate(V3 = min(month(parse_date_time(V2, "%m")))) %>%
    ungroup() %>%
    mutate(V3 = month(V3, label = TRUE, abbr = FALSE))

Bernhard · Answer 2 · 2020-06-09T16:13:29.053

Not strictly dplyr but I consider this to be rather easy to read (at least not many nested parens). Also: My minmonth function is handy for reuse some other time and it is easily translated to input in a non-English language:

dat <- read.table(text = "ID V1 V2
                           A  2  June
                           B  3  May
                           A  2  January
                           F  4  December", header = TRUE)

minmonth <- function(m){
  months <- c(January = 1, February = 2, March = 3,  # easily translated to 
             April = 4, May = 5, June = 6, July = 7, # other languages
             August = 8, September = 9, October = 10,
             November = 11, December = 12)
  m <- months[m]                                     # no static typing in R
  smallest <- min(m)
  return(names(months)[smallest])
}

dat$V3 <- ave(dat$V2, dat$ID, FUN = minmonth)

R/dplyr: Summarize data without grouping it

2 Answers2