1

I have a dataframe

 df <- data.frame(structure(list(col1= c("A", "B", "C", "D", "A"), 
         col2= c(1, 1, 1, 1, 5), col3 = c(2L, 1L, 1L, 1L, 1L)),
         .Names = c("col1", "col2", "col3"), 
         row.names = c(NA, -5L), class = "data.frame"))

I want to add additional column, col4 with values based on col2. Rows that have the same value in col2 will have the same value in col4 as well.

With a work around, I generated a result in the following way.

x <- df[!duplicated(df$col2),]
x$col4 <- paste("newValue", seq(1:nrow(x)), sep="_")

df_new <- merge(x, df, by ="col2")

df_new <- df_new[,c("col2","col4", "col1.y", "col3.y")]

This works but I thought there is a better way doing this. Thank you!

Chris
  • 1,248
  • 4
  • 17
  • 25

2 Answers2

2

You could try dense_rank() from dplyr:

library(dplyr)
df %>% 
    mutate(col4 = dense_rank(col2),
           col4_new = paste0("newValue_", col4))

This gives something very similar to your desired output in your question, but I'm not sure exactly what you're looking for. If you want to ensure that all rows with identical values in col2 get the same value in col4 then just arrange the df and then use dense_rank :

df %>% 
    arrange(col2) %>% 
    mutate(col4 = dense_rank(col2),
           col4_new = paste0("newValue_", col4))

This should work for a data.frame of arbitrary size.

Jim Leach
  • 449
  • 5
  • 7
1

May be this helps

df$col4 <- paste0("newValue_", cumsum(!duplicated(df$col2)))
df$col4
#[1] "newValue_1" "newValue_1" "newValue_1" "newValue_1" "newValue_2"

Or we use match

with(df, paste0("newValue_", match(col2, unique(col2))))
#[1] "newValue_1" "newValue_1" "newValue_1" "newValue_1" "newValue_2"

Or it can be done with factor

with(df, paste0("newValue_", as.integer(factor(col2, levels = unique(col2)))))
akrun
  • 874,273
  • 37
  • 540
  • 662