1

I have this dataset called 'jobdata'

 names <- c("person1", "person2", "person3")
 job1_1_sector <- c("Private", "Public", "Private")
 job2_1_sector <- c(NA, "Public", "Private")
 job2_2_sector <- c("Private", "Public", "Other")
 job3_1_sector <- c("Private", "Private", "Private")
 job3_2_sector <- c("Other", "Public", "Other")
 job3_3_sector <- c("Private", NA, "Private")
 jobs <- cbind(job1_1_sector, job2_1_sector, job2_2_sector, job3_1_sector, 
 job3_2_sector, job3_3_sector )

 jobdata <- data.frame(names, jobs)

And I want to create a new binary variable private that equals 1 if across the relevant variables (that is job[123]_[123]_sector) if the word Private comes up. Then another one for Public and another one for Other. I have figured out how to use this with ifelse and grepl but it looks like my lines of codes are really long. Is there an easier way to do this?

This code below gives me the code I want:

 jobdata$private <- ifelse(grepl("Private", jobdata$job1_1_sector) | grepl("Private", jobdata$job2_1_sector) | grepl("Private", jobdata$job2_2_sector) | grepl("Private", jobdata$job3_1_sector) | grepl("Private", jobdata$job3_2_sector) | grepl("Private", jobdata$job3_3_sector), 1, 0)

 jobdata$public <- ifelse(grepl("Public", jobdata$job1_1_sector) | grepl("Public", jobdata$job2_1_sector) | grepl("Public", jobdata$job2_2_sector) | grepl("Public", jobdata$job3_1_sector) | grepl("Public", jobdata$job3_2_sector) | grepl("Public", jobdata$job3_3_sector), 1, 0) 

 jobdata$other <- ifelse(grepl("Other", jobdata$job1_1_sector) | grepl("Other", jobdata$job2_1_sector) | grepl("Other", jobdata$job2_2_sector) | grepl("Other", jobdata$job3_1_sector) | grepl("Other", jobdata$job3_2_sector) | grepl("Other", jobdata$job3_3_sector), 1, 0) 

Thanks!

imprela
  • 83
  • 9

3 Answers3

3

For complex operations it can often be useful to first make the operation into a function and then apply it to each case. For instance,

get_sector <- function(x, sector) {
  apply(x, 1, function(y) {
    as.numeric(any(grepl(sector, y), na.rm = TRUE))
  })
}

jobdata$private <- get_sector(jobdata, "Private")
jobdata$public <- get_sector(jobdata, "Public")
jobdata$other <- get_sector(jobdata, "Other")
1

A tidyverse/dplyr solution would be to first condense the many job columns into a single set of labels and values:

library(tidyverse)

jobdata.long <- jobdata %>% 
  gather(job.number, sector, -names)

     names    job.number  sector
1  person1 job1_1_sector Private
2  person2 job1_1_sector  Public
3  person3 job1_1_sector Private
4  person1 job2_1_sector    <NA>
5  person2 job2_1_sector  Public
6  person3 job2_1_sector Private
7  person1 job2_2_sector Private
8  person2 job2_2_sector  Public
9  person3 job2_2_sector   Other
...

And then apply your regular expressions to the newly created "sector" column, probably in tandem with summarize to get a single TRUE/FALSE flag for each person and category:

job.types <- jobdata.long %>% 
  group_by(names) %>% 
  summarize(
    private = any(grepl('Private', sector)),
    public = any(grepl('Public', sector)),
    other = any(grepl('Other', sector))
  )

    names private public other
   <fctr>   <lgl>  <lgl> <lgl>
1 person1    TRUE  FALSE  TRUE
2 person2    TRUE   TRUE FALSE
3 person3    TRUE  FALSE  TRUE
jdobres
  • 11,339
  • 1
  • 17
  • 37
  • `outer(lift(paste)(jobdata),c("Private","Public","Other"),str_detect)` If you were to use the `tidyverse` .. You can still use `invoke` functions and also `cross` functions – Onyambu May 14 '18 at 22:31
0

You can use the very powerful (s)apply family like so:

# define the types
type <- c("Private", "Public", "Other")

# columns in question
mask <- grepl("^job\\d+_\\d+_sector", colnames(jobdata))

# apply(..., 1, ...) means row-wise
jobdata[type] <- t(apply(jobdata[mask], 1, function(x) {
  (s <- sapply(type, function(y) {
    as.numeric(y %in% x)
  }))
}))

This yields

    names job1_1_sector job2_1_sector job2_2_sector job3_1_sector job3_2_sector job3_3_sector Private Public Other
1 person1       Private          <NA>       Private       Private         Other       Private       1      0     1
2 person2        Public        Public        Public       Private        Public          <NA>       1      1     0
3 person3       Private       Private         Other       Private         Other       Private       1      0     1
Jan
  • 42,290
  • 8
  • 54
  • 79