4

I'm trying to recode a column to determine the shift of an employee.

The data is messy and the word I am looking for must be extracted from the text. I've been trying various routes with if statements, stringr and dplyr packages, but can't figure out how to get them to work together.

I have this line of code, but str_match doesn't produce a true/false value.

Data$Shift <- if(str_match(Data$Unit, regex(first, ignore_case = TRUE))) {
    print("First Shift")
  } else {
    print("Lame")
  }

recode is working, but I have multiple values I need to recode and want to learn if there is a way to incorperate stringr into the recode function.

Data$Shift1 <- recode(Data$Unit, "1st" = "First Shift")

Currently, the text must be extracted from the column to contain 1st, First, or first for First Shift. My data looks like the Unit Column, and I want to Recode it into the Shift Column:

Unit                        Shift
Detention, Third Shift      Third Shift
D, 3rd Shift                Third Shift
1st                         First Shift
first shift                 First Shift
First Shift                 First Shift
1st shift                   First Shift
1st Shifft                  First Shift `
Kas Elvirov
  • 7,394
  • 4
  • 40
  • 62
britt
  • 79
  • 10
  • If you are putting a value into Data$shift, you don't want to use `print`. You should also use `ifelse` not `if` here. `stringr` is probably overkill, why not just a few statements like `Data$shift[grepl("3rd", Data$shift)] <- "Third Shift"`? –  Apr 03 '18 at 14:37
  • Thank you! This is exactly what I needed. Just went from 100 unique values to 6. – britt Apr 03 '18 at 15:18
  • Just FYI, `str_match` doesn't return a true/false, but `str_detect` does – camille Apr 03 '18 at 16:18

2 Answers2

3

I'd recommend just using grepl with case_when within dplyr.

library(dplyr)

Data %>% 
  mutate(Shift = case_when(grepl("first|1st", Unit, ignore.case = TRUE) ~ "First Shift",
                           grepl("third|3rd", Unit, ignore.case = TRUE) ~ "Third Shift",
                           TRUE                                         ~ "Neither"))
  • mutate creates our new column Shift

  • grepl returns a logical vector if it matches the pattern or not. In this case, the pattern I used was "first|1st". The | character means OR, so as is, that checks for either "first" OR "1st".

  • case_when works like multiple "if" statements, allowing us to keep our logic together (similar to SQL syntax). The final line of case_when is kind of our safety net here....if a value for Unit does not contain 1st or 3rd shift, it will return "Neither", and so we know to investigate further.

If you don't have a recent version of dplyr (>0.7.3), then case_when might not work for you. If so, we can replace case_when with a chain of nested ifelse.

Data %>% 
  mutate(Shift = ifelse(grepl("first|1st", Unit, ignore.case = TRUE),
                        "First Shift",
                        ifelse(grepl("third|3rd", Unit, ignore.case = TRUE),
                               "Third Shift",
                               "Neither")))

Not as pretty, but should be the same result since our patterns used in grepl are mutually exclusive.

Dave Gruenewald
  • 5,329
  • 1
  • 23
  • 35
  • Thank you! I've tried running it, but for some reason, it's not reading my column where "Unit" is unless Data$Unit is specified. When fixed, it runs without error, but does not produce the Shift column. The package is installed. Is there something I could be missing? – britt Apr 03 '18 at 16:37
  • Oh, this might be an issue with the package version of `dplyr` you are using. `packageVersion("dplyr")` for me is `0.7.4`. I believe `case_when` was made available in version `0.7.3`. Also, is `Data` a dataframe? – Dave Gruenewald Apr 03 '18 at 17:01
  • Okay that's probably it. I have 0.5.0, even after update.packages Data is a data.frame – britt Apr 03 '18 at 17:07
  • I'll add an option for older `dplyr` versions – Dave Gruenewald Apr 03 '18 at 17:08
  • Thank you! For whatever reason, I had to add 'Data1 = ' before the above section to include the new column in a new dataset, but it's working perfect otherwise. – britt Apr 03 '18 at 18:33
0

Keep it simple:

Data$shift[grepl("3rd", Data$shift)] <- "Third Shift"
Data$shift[grepl("1st", Data$shift)] <- "First Shift"

Etc.