I have a df with 2 million (2m) rows, which contains a column of different entries grouped by unique IDs. For e.g
df <- df %>% mutate(A = c(1,1,1,1,1,1,2,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5,5,6,6,6,6 6, ... , 2m, 2m, 2m, 2m)
I want to create another column, say B, in which I'll assign each x in the number vector c(2008:2021)
to each unique group in column A.
I have tried the following:
df <- df %>% mutate(B = c(2008:2021)[match(B, unique(A)]
But this doesn't extend to the entire rows of the df. It only asigns each x in the vector to each group of unique value up to the extent of the vector, which is 2021, and then, stops. Leaving NAs for the remaining rows. However, what I want is for c(2008:2021)
to start from the beginning (2008) to assign to other groups of unique values in B when it gets to 2021.
I've also tried using the rep()
function, with length.out = nrow(df)
, but that still didn't work. That is, in the mutate
function,
df <- df %>% mutate(B = rep(c(2008:2021)[match(B, unique(A)], length.out = nrow(df)
This gave unwanted results even after using the each and times arguments.
Please, can anyone assist me with the correct and efficient way to doing this. Many thanks.
Cheers, Davidmac