0

Sorry if it's a dumb question, but I'm not sure what keywords to use to find the answer so nothing I get is quite what I'm looking for.

I have a column: df$infecting_agent. The entries there are things like "staphylococcus" "bacteria" "virus" "bacterial", etc.

I want two new columns: df$bacteria and df$virus

I want all observations to have a "1" for bacteria if the diagnoses entry contains "bact" or "cocc" or "staph" where anything is allowed before or after what's in quotes. I'll do similar for the virus column, many observations will have a 1 in both columns.

Can someone tell me what package to use or at least what the "lingo" is I should be using to search for my problem? I tried variations of "replace string with 0 or 1 in R" but I don't think I'm getting anything relevant.

Thank you all!

smci
  • 32,567
  • 20
  • 113
  • 146
CineyApp
  • 19
  • 3
  • It would help to see some example data and an example of required output. It's not clear how there can be 1 in both columns; I don't see how an agent can be both bacterium and virus. – neilfws Apr 11 '17 at 00:34
  • @neilfws: the string could be "Either bacterium or virus". – smci Apr 11 '17 at 00:47
  • `df$bacteria <- grepl("bact", df$infecting_agent)`? Add zero if you want an integer rather than a logical – Richard Telford Apr 11 '17 at 08:12
  • @neilfws The column "infecting agent" often has entries such as: "x species of bacteria + y species virus" or with a , or / instead of the + (i should have mentioned this in the original post). – CineyApp Apr 11 '17 at 17:47

1 Answers1

1

You can do that with dplyr and stringr:

library(dplyr);library(stringr)

df1 <- data.frame(infecting_agent=c('staphylococcus','bacteria','virus','bacterial'))
df1 %>%
mutate(bacteria=ifelse(str_detect(infecting_agent, 'bact|cocc|staph'),1,0),
       virus=ifelse(str_detect(infecting_agent, 'vir|cocc'),1,0)
       )

  infecting_agent bacteria virus
1  staphylococcus        1     1
2        bacteria        1     0
3           virus        0     1
4       bacterial        1     0
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
  • I think the question is ambiguous. Perhaps the `virus` column should contain 1 only if `infecting_agent` = virus. – neilfws Apr 11 '17 at 00:38