2

I have a data frame which looks like this:

codes <- c('TFAA1', 'TFAA2', 'TFAA3', 'TFAA4', 'TFAB1', 'TFAB2', 'TFAB3', 'TFAB4')
scores <- c(4,3,2,2,4,5,1,2)
example <- data.frame(codes, scores)

I want to create a new column called code_group whereby everything that starts with TFAA gets called "Group1" and everything that starts with TFAB gets called "Group2".

Have been playing with the recode function from the car package, and the grepl function but I'm failing miserably. Here's my attempt so far....

recode <- (codes, "%in% TFAA='Group1'; %in% TFAB='Group2'")
stixmcvix
  • 313
  • 1
  • 2
  • 10

4 Answers4

3

With dplyr and stringr you can get it done:

library(dplyr)
library(stringr)
example %>% 
  mutate(code_group = case_when(str_detect(codes, "^TFAA") ~ "Group1",
                              str_detect(codes, "^TFAB") ~ "Group2"))

case_when lets you use multiple if-then cases. str_detect lets you, well, detect the pattern you seek in a string.

RLave
  • 8,144
  • 3
  • 21
  • 37
1
example$code_group <- ifelse(startsWith(codes, 'TFAA'), 'Group 1', 
                      ifelse(startsWith(codes, 'TFAB'), 'Group 2',
                             NA))
IceCreamToucan
  • 28,083
  • 2
  • 22
  • 38
1

We could extract the first four characters with substr, convert it to factor and specify the labels as the one we wanted

example$code_group <-  with(example,  as.character(factor(substr(codes, 1, 4), 
              levels = c('TFAA', 'TFAB'), labels = c('Group1', 'Group2'))))
akrun
  • 874,273
  • 37
  • 540
  • 662
1

We can use split<- :

example$group <- NA
split(example$group,substr(example$codes,1,4)) <- paste0("Group",1:2)
example
#   codes scores  group
# 1 TFAA1      4 Group1
# 2 TFAA2      3 Group1
# 3 TFAA3      2 Group1
# 4 TFAA4      2 Group1
# 5 TFAB1      4 Group2
# 6 TFAB2      5 Group2
# 7 TFAB3      1 Group2
# 8 TFAB4      2 Group2

Or we can use factors for the same output (3 variants):

example$group <- paste0("Group",factor(substr(example$codes,1,4),,1:2))
example$group <- paste0("Group",as.numeric(factor(substr(example$codes,1,4))))
example$group <- factor(substr(example$codes,1,4),,paste0("Group",1:2))

In the last case you get a factor column, in all other cases you get a character column.

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167