1

I have a df with 2 million (2m) rows, which contains a column of different entries grouped by unique IDs. For e.g

df <- df %>% mutate(A = c(1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6 6, ... , 2m, 2m, 2m, 2m)

I want to create another column, say B, in which I'll assign each x in the number vector c(2008:2021) to each unique group in column A.

I have tried the following:

df <- df %>% mutate(B = c(2008:2021)[match(B, unique(A)]

But this doesn't extend to the entire rows of the df. It only asigns each x in the vector to each group of unique value up to the extent of the vector, which is 2021, and then, stops. Leaving NAs for the remaining rows. However, what I want is for c(2008:2021) to start from the beginning (2008) to assign to other groups of unique values in B when it gets to 2021.

I've also tried using the rep() function, with length.out = nrow(df), but that still didn't work. That is, in the mutate function,

df <- df %>% mutate(B = rep(c(2008:2021)[match(B, unique(A)], length.out = nrow(df) 

This gave unwanted results even after using the each and times arguments.

Please, can anyone assist me with the correct and efficient way to doing this. Many thanks.

Cheers, Davidmac

nightstand
  • 329
  • 2
  • 11
Davidmac
  • 11
  • 1
  • Sorry this is the best results I have so far df <- df %>% mutate(B = c(2008:2021)[match(A, unique(A)]. – Davidmac Mar 30 '23 at 00:26

1 Answers1

0

We could use %% to do this

library(dplyr)
years <- 2008:2021
df %>% 
 mutate(B = years[(as.integer(factor(A, levels = unique(A)))-1) %% 
       length(years) + 1])

data

df <- data.frame(A = rep(paste0("v", 1:85), each = 5))
akrun
  • 874,273
  • 37
  • 540
  • 662