0

I have a dataset about D&D Characters that looks something like this

Race   Class              Level   AC
Human  Fighter | Wizard    10     15
Elf    Wizard              8      10
Human  Rogue               6      12
Dwarf  Barbarian           15     18

I want to separate the classes that are multiclassing indicated by the "|" Also If a character doesn't multiclass, I want to place an "NA" or "None" in that slot

Race   Primary_Class      Level   AC    Subclass   Multiclass
Human  Fighter             10     15    Wizard         1
Elf    Wizard              8      10    NA             0
Human  Rogue               6      12    NA             0
Dwarf  Barbarian           15     18    NA             0

Is there a clean way to do this?

3 Answers3

1

You can do this with three ifelseclauses, grepl as well as backreference with \\1and \\2 respectively to match the pattern in question and gsub to manipulate the match:

df1$Primary_class <- ifelse(grepl("\\|", df1$Class), 
                            gsub("([A-z]+)\\s\\|\\s([A-z]+)", "\\1", df1$Class), df1$Class)

df1$Subclass <- ifelse(grepl("\\|", df1$Class), 
                            gsub("([A-z]+)\\s\\|\\s([A-z]+)", "\\2", df1$Class), "NA")

df1$Multiclass <- ifelse(grepl("\\|", df1$Class), 1, 0)

df1
   Race            Class Level AC Primary_class Multiclass Sub_class
1 Human Fighter | Wizard    10 15       Fighter          1    Wizard
2   Elf           Wizard     8 10        Wizard          0        NA
3 Human            Rogue     6 12         Rogue          0        NA
4 Dwarf        Barbarian    15 18     Barbarian          0        NA
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
1

We can use sub to remove everything after "|", str_extract to extract everything after "|" and use str_detect to detect if "|" is present in the data.

library(dplyr)
library(stringr)

df %>%
 mutate(Primary_Class = trimws(sub('\\|.*',  '', Class)), 
        Subclass = str_extract(Class, "(?<=\\|).*"), 
        Multiclass = +(str_detect(Class, "\\|"))) %>%
 select(-Class)

#   Race Level AC Primary_Class Subclass Multiclass
#1 Human    10 15      Fighter   Wizard          1
#2   Elf     8 10       Wizard     <NA>          0
#3 Human     6 12        Rogue     <NA>          0
#4 Dwarf    15 18    Barbarian     <NA>          0
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
0

We can use separate to split tthe 'Class' into two column ('Primary_Class', 'Subclass') by specifying the sep as zero or more space (\\s*) followed by | and zero or more spaces (\\s*), then create the 'Multiclass' by checking whether the 'Subclass' NA elements

library(dplyr)
library(tidyr)
separate(df1, Class, into = c('Primary_Class', 'Subclass'),
      '\\s*\\|\\s*', extra = 'merge') %>%
     mutate(Multiclass = +(!is.na(Subclass)))
#   Race Primary_Class Subclass Level AC Multiclass
#1 Human       Fighter   Wizard    10 15          1
#2   Elf        Wizard     <NA>     8 10          0
#3 Human         Rogue     <NA>     6 12          0
#4 Dwarf     Barbarian     <NA>    15 18          0

data

df1 <- structure(list(Race = c("Human", "Elf", "Human", "Dwarf"), 
   Class = c("Fighter | Wizard", 
"Wizard", "Rogue", "Barbarian"), Level = c(10L, 8L, 6L, 15L), 
    AC = c(15L, 10L, 12L, 18L)), class = "data.frame", row.names = c(NA, 
-4L))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • I think this is really interesting. I've never seen this `+` approach before. Could you please explain how does that work? – rdornas Mar 16 '20 at 01:48
  • Sure! I already did it. I didn't understand why someone gave you a negative vote and I upvoted your post (just before I made this comment and question). This is probably why you're not seeing a negative number here. – rdornas Mar 16 '20 at 18:29
  • 1
    @rdornas thank you. It is a hacky way to convert TRUE/FALSE to binary. eg. `v1 <- c(TRUE, FALSE, FALSE); +(v1)` or `as.integer(v1)` as these are stored as 1/0, and it is coerced to its integer mode with that – akrun Mar 16 '20 at 18:31