3

I am new to R and am stuck on what seems like an easy task - create new column vector in R data frame conditional on an existing character vector.

As an example, I have a data frame, "class", with one character column ("Names") and one numeric column ("Student_numbers"):

Names <- c("Sarah", "Mary", "Ben", "Will", "Alex") 
Student_numbers <- c(3,5,6,7,7)
class <- data.frame(Names, Student_numbers) 

To the data frame "class", I would like to add a new character column called "Gender" which is based on values in the character vector, "Names":

Male <- c("Ben", "Will", "Alex") 
Female <- c("Sarah", "Mary") 

 Names    Student_numbers  Gender
1 Sarah   3                Female
2 Mary    5                Female
3 Ben     6                Male
4 Will    7                Male
5 Alex    7                Male

Instead of doing this manually, I would like to do it automatically based on the character vectors defined above.

Thank you in advance for your help.

Isobel M
  • 55
  • 1
  • 6

3 Answers3

3

You can use ifelse here:

class$Gender <- ifelse(class$Names %in% Male, 
                       "Male", 
                       ifelse(class$Names %in% Female, "Female", NA))
class
#   Names Student_numbers Gender
# 1 Sarah               3 Female
# 2  Mary               5 Female
# 3   Ben               6   Male
# 4  Will               7   Male
# 5  Alex               7   Male

If you would have more cases you also can use case_when from dplyr:

library(dplyr)
case_when(class$Student_numbers < 4 ~ "Grp1",
          class$Student_numbers < 6 ~ "Grp2",
          class$Student_numbers < 7 ~ "Grp3",
          TRUE                      ~ "Other")
thothal
  • 16,690
  • 3
  • 36
  • 71
2

This solution works using the library Tidyverse:

library(tidyverse)
Names <- c("Sarah", "Mary", "Ben", "Will", "Alex") 
Student_numbers <- c(3,5,6,7,7)
class <- data.frame(Names, Student_numbers)
class
class <- class %>% mutate(gender = ifelse(Names %in% c("Sarah","Mary"),"Female","Male"))
class

And the result is:

  Names    Student_numbers   gender
1 Sarah               3      Female
2  Mary               5      Female
3   Ben               6      Male
4  Will               7      Male
5  Alex               7      Male

Hope it helps.

Addition: Thinking about your additional example, let's have this:

df <- data.frame(dogs = c("Chucho","Pulgas","Pirata","Carcas","Fido","Bigotes"), 
         number_id = c("10","12","15","16","30","19"), stringsAsFactors = FALSE)

df <- df %>% mutate(dog_type = ifelse(dogs %in% c("Chucho","Pulgas"),"Chihuahua",
                           ifelse(dogs %in% c("Pirata","Carcas"),"Hairless Chimu","San Bernardo"))) %>% mutate(dog_size = ifelse(dog_type %in% c("Chihuahua","Hairless Chimu"),"Small","Big"))

   dogs      number_id   dog_type          dog_size
1  Chucho        10      Chihuahua         Small
2  Pulgas        12      Chihuahua         Small
3  Pirata        15      Hairless Chimu    Small
4  Carcas        16      Hairless Chimu    Small
5  Fido          30      San Bernardo      Big
6  Bigotes       19      San Bernardo      Big

Hope I answered your additional question.

Regards,

Alexis

Alexis
  • 2,104
  • 2
  • 19
  • 40
  • 1
    Thanks Alexis. To extend on my question, what if instead I had multiple categories for my new character vector I wanted to created. E.g. Existing character column in data frame is on dog types (of which there are over 20 unique) values and I want a new character column to classify them as big, medium, small or very small. Would I use the "else if" function? Thank you in advance. – Isobel M Sep 20 '19 at 14:59
  • Hello @Jane Isobel. You can use a nested ifelse, in your example to evaluate each dog type. Ideally you have the dog type and dog size as a dictionary in a separate dataframe, so you can filter against any other data e.g. a list of names and type. I will add the example in the code above. – Alexis Sep 20 '19 at 15:30
2

You can also use sapply and the more familiar if

class$gender <- sapply(class$Names, function(x) if(x %in% Male) "Male" else "Female" )

 class
 Names Student_numbers gender
1 Sarah               3 Female
2  Mary               5 Female
3   Ben               6   Male
4  Will               7   Male
5  Alex               7   Male

I would also suggest to add stringAsFactors=FALSE when you create class to avoid having to deal with factors.

fra
  • 832
  • 6
  • 14